Senior Engineering Manager leading Data Center telemetry solutions at NVIDIA, driving architecture, development, and deployment for AI supercomputing platforms. Recruiting and managing top talent to optimize data center performance.
Responsibilities
Own the end-to-end architecture and delivery for telemetry solutions, including fleet health monitoring, fault remediation, and data visualization at scale
Own OOB telemetry solution and data validation for telemetry from each underlying device
Recruit, develop, and motivate a high-performing engineering team focused on platform telemetry, RAS and observability
Continuously improve software development processes for optimal productivity and quality
Work across teams to ensure seamless integration of telemetry solutions with platform firmware, server architecture, and data center management
Drive product life cycles with QA teams, ensuring robust testing, productization, and delivery
Conduct performance reviews, foster a culture of excellence, and ensure high productivity
Requirements
12+ overall years of relevant experience
5+ years of managing systems/platform software teams
BS, MS, or PhD in EE/CS or related field (or equivalent experience)
Strong knowledge of DMTF/PLDM for OOB telemetry collection
Time series databases (e.g., InfluxDB, Prometheus) and REST APIs (Redfish)
Deep understanding of Server and firmware architecture and optimization for low-latency APIs
Proven track record of delivering scalable server products and telemetry solutions
Experience with SCM (Git, Perforce) and project management tools (Jira)
Hands-on experience with x86/ARM system architecture and coding (C/C++, Python)
Familiarity with Confidential Compute and notification systems
Demonstrated ability to analyze algorithms for time/space complexity and system resource requirements
Benefits
Equity
Benefits
Job title
Senior Manager, Engineering – Data Center Telemetry, RAS
Stat Programmer creating statistical tables and analysis databases for clinical research projects. Collaborating with internal and external clients and providing technical expertise in statistics and programming.
Proposal Developer crafting proposals and budgets for client Requests for Proposal at IQVIA. Collaborating with sales and operational teams for alignment and quality.
Full Stack Engineer designing and developing scalable applications at Rockwell Automation. Collaborating with teams to deliver innovative solutions using modern technologies and methodologies.
Director of Engineering Discipline Execution at Northrop Grumman overseeing engineering practices and leading a distributed team. Collaborating on strategies to enhance engineering performance and alignment across divisions.
Augmented and Virtual Reality Weather Developer at CBS creating immersive weather tools. Collaborating with meteorologists to enhance live broadcast storytelling with AR/VR technology.
Logistics Engineer optimizing material flow across inbound, storage, and logistics for production at Vista. Specializing in continuous improvement and operational alignment with manufacturing
Internship in Prototype Build Engineering at Automobili Lamborghini focusing on supporting Prototype Building and Workshop Activities. Seeking candidates with a Master's degree in Engineering.
Internship in Prototype Build Engineering at Automobili Lamborghini. Supporting staff with prototype building and hardware maturity management tasks in Sant'Agata Bolognese.
Develop scalable data pipelines and analytics solutions at Miami University. Collaborate with stakeholders to enhance data quality and maintainability.
Senior Statistical Programmer providing programming deliverables for Early Development Statistics and PK/PD Modeling and Simulation. Collaborating with global teams and stakeholders across various therapeutic areas.