Onsite Technical Program Manager, Cloud Infrastructure

Posted 2 hours ago

Apply now

About the role

  • Technical Program Manager driving AI infrastructure with external partners at NVIDIA. Collaborating with engineering and infrastructure teams to enhance AI capacity and management.

Responsibilities

  • As a DGX Cloud Technical Program Manager, you'll be a key partner to our Engineering, Infrastructure, Software teams and their leadership, driving critical programs related to AI capacity enablement and management .
  • You'll play a pivotal role in developing and maturing foundational capabilities and processes for DGX Cloud, spanning critical areas such as cluster/capacity bring-up including CPU, storage, networking and compute requirements to support GPUs.
  • This is a dynamic, fast-paced environment where TPMs are expected to apply fungible skillsets to a range of high-impact programs across DGX Cloud.
  • Collaborating closely with storage engineering and network engineering teams to define and communicate requirements to CSP (Cloud Service Providers) and NCP’s (NVIDIA Cloud Providers).
  • Drive alignment and a POR for capacity blocks based on workload needs.
  • Drive early engagement with CSP (Cloud Service Providers) and NCP’s (NVIDIA Cloud Providers) to understand their managed storage, network solutions and influence alignment with NVIDIA Cloud roadmap
  • Gathering technical requirements, developing comprehensive roadmaps, establishing clear milestones, and ensuring adherence to our Product Lifecycle (PLC) process.
  • Managing ongoing capacity operations and the engineering engagement with CSP (Cloud Service Providers) and NCP’s (NVIDIA Cloud Provider) partners, collaborating closely with an SRE lead.
  • Focus on availability, maintenance and other critical performance indicators.
  • Partner closely within NVIDIA to understand workload requirements, related HW and infra needs, including speeds/feeds to optimize and infrastructure readiness with CSP (Cloud Service Providers) and NCP’s (NVIDIA Cloud Providers).
  • Leveraging Jira and other program management platforms to instill rigor and structure in the management of engineering deliverables.
  • Identifying and driving opportunities to onboard the adoption of third-party and in-house cloud infrastructure solutions for deployments, support, security, compliance and observability across DGX Cloud
  • Establishing key performance indicators (KPIs) and quantitatively demonstrating the value and impact delivered by your programs.
  • Proactively identifying, resolving, and mitigating risks and issues that could affect scope, schedule, and quality across all program aspects.
  • Cultivating a culture of continuous improvement, consistently identifying opportunities for process enhancements within our cloud infrastructure operations.

Requirements

  • 12+ years of technical program management experience, specifically driving the planning and execution of large-scale cloud infrastructure programs with external partners, with a strong focus on software engineering projects within a matrixed organization.
  • Extensive hands-on experience in cloud infrastructure, preferably gained from working at a major Cloud Service Provider (CSP).
  • Domain knowledge in the bring-up and end to end operations of compute, storage, networking and GPU (including common failure points at the HW and SW levels).
  • Expert-level proficiency with Jira, Smartsheet, or similar program management tools, with the ability to confidently guide engineering teams on their use of the tools.
  • Exceptional strategic and tactical thinking abilities, coupled with a strong capacity to build consensus and drive program success
  • Comfort and effectiveness in thriving within ambiguous environments.
  • Possess excellent communication and technical presentation skills, particularly for executive audiences.
  • BS or MS in Electrical Engineering or Computer Science, or equivalent experience.

Benefits

  • equity
  • benefits

Job title

Technical Program Manager, Cloud Infrastructure

Job type

Experience level

SeniorLead

Salary

$200,000 - $322,000 per year

Degree requirement

Bachelor's Degree

Tech skills

Location requirements

Report this job

See something inaccurate? Let us know and we'll update the listing.

Report job