HPC scheduler/resource manager engineer crafting scheduling strategies for large datacenter clusters. Driving cutting-edge innovations in AI and GPU computing with top scientific partners and technologies.
Responsibilities
Provide engineering solutions and prototypes to enable efficient resource management and job scheduling for large scale clusters
Drive next generation requirements and features for schedulers in at scale clusters
Ensure technical relationships with internal and external engineering teams
Assist system architects and machine learning/deep learning engineers in building creative solutions based on NVIDIA technology
Be an internal reference for scheduling and resource management concepts and methodologies among the NVIDIA technical community
Test, evaluate, and benchmark new technologies and products and work with vendors, partners and peers to improve functionality and optimize performance
Requirements
BS, MS, or PhD in Engineering, Mathematics, Physics, Computer Science, or equivalent experience
12+ years of experience designing and running scheduling and resource management systems in large datacenter/AI/HPC solutions
Knowledge and experience with resource management / scheduling code bases: SLURM preferred, other implementations (LSF, SGE, Torque...)
Proven understanding of performance clusters, infrastructure and workload patterns
Experience using and installing Linux-based server platforms
Senior SCADA Engineer designing and maintaining SCADA systems for North American AES Clean Energy operations. Collaborating with teams to ensure reliable data acquisition and process control while optimizing performance.
Senior Protection Relay & Metering Engineer providing technical expertise in renewable energy operations. Supporting and optimizing AES Clean Energy operational performance of a fleet of renewables across the US.
Engineer providing engineering and technical support for Lilly's global packaging operations. Analyzing workflows and processes to improve quality and efficiency in packaging procedures.
Service Desk Engineer providing technical support as part of a team for Capgemini customers. Assisting in diagnosing problems and implementing solutions in a collaborative environment.
Product Development Engineer designing and developing advanced products per customer specifications at Belden. Collaborating with marketing and sales to optimize product design and support customer training.
Engineer in automation technology developing complex systems and software for industrial production processes. Focus on innovative energy and control solutions for industries like paper and cable manufacturing.
Logistics Engineer analyzing and optimizing intralogistics processes for Ingersoll Rand. Leading improvement projects while managing material flows and ensuring effective communication with partners.
Manufacturing Engineer leading manufacturing processes and projects at Haskel. Specializing in high - pressure fluid management and efficiency improvement at the Sunderland site.
Design Engineer undertaking mechanical engineering activities for Ejector systems at Transvac. Collaborating with expert teams to ensure robust and value - engineered mechanical designs.
Maintenance Engineer at Tenneco focusing on PLC programming and robot maintenance. Ensuring smooth operations of manufacturing machinery and implementing improvements across the Chakan plant.