Product Manager guiding Health Automation and Resilience efforts for AI infrastructure at NVIDIA. Collaborating with engineering to develop fault detection and automated repair workflows.
Responsibilities
Establish the product vision and strategy for Health Automation and Resilience across DGX Cloud and partner GPU fleets.
Partner with engineering on the architecture and delivery of software agents, services, control loops, and distributed health components.
Convert hardware signals, telemetry pipelines, and operational insights into automation systems that reduce manual intervention.
Work with cloud providers and enterprise operators to understand failure modes and operational challenges.
Develop product specifications, technical requirements, and validation criteria for both internal and open-source components.
Support go-to-market activities including documentation, demos, partner enablement, and release readiness.
Track trends in observability, SRE practices, distributed systems, and automated operations to define long-term strategy.
Lead product technical reviews, customer conversations, and planning sessions.
Requirements
Bachelor’s degree in Computer Science, Engineering, or a similar area, or equivalent experience.
8+ years of relevant experience including demonstrated experience leading technical products within cloud infrastructure, distributed systems, reliability engineering, or related fields.
Track record defining multi-quarter strategy and leading execution with multiple engineering teams.
Ability to craft clear product requirements, work directly with engineering partners on technical decisions, and compose system-level workflows.
Strong architectural understanding of control planes, telemetry systems, health monitoring, repair workflows, or automated remediation systems.
Understanding of telemetry signals, SLOs, failure modes, and repair workflows in production environments.
Experience building automation, resilience, or failure-recovery capabilities for large-scale cloud or HPC environments.
Experience working with open-source technologies or products for software developers.
Excellent communication skills across engineering, customers, and executives.
SAP Product Owner responsible for future - proofing business solutions at GQS, a leading SAP Gold partner for the food industry. Engaging in strategic product design and team leadership to drive excellence.
Product Owner defining requirements and driving system development for hotel solutions. Collaborating with teams and stakeholders to ensure agile and efficient implementation.
Product Management Placement for Samsung involved in product lifecycle, strategy, and market insights across B2B and consumer products. Engage with stakeholders to track product metrics and drive sales initiatives.
Head of Product responsible for shaping product strategy and execution in a B2B SaaS company. Driving innovation and growth through customer - centric product development and data - driven decision - making.
Master Data Product Owner leveraging product information management in Roche Diagnostics. Collaborating with cross - functional teams to ensure data - driven delivery.
Product Manager enhancing customer identity and access capabilities for Pluralsight's learning platform. Leading efforts to streamline user engagement and improve activation and retention metrics.
Product Manager driving marketing automation within the Conversion & Marketing Technologies team at a leading motor insurance provider. Shaping customer data processes to enhance communication with millions of drivers.
Director of Software Product Management at IDEXX leading global LIMS strategy and execution. Overseeing Software Product Managers and ensuring delivery of reliable, scalable laboratory platforms.