Senior GPU Supercomputer Scheduler Engineer designing and implementing scheduling features for GPU compute clusters. Collaborating with technical leaders to optimize performance for advanced AI workloads.
Responsibilities
Design and develop new scheduling features and add-on services to improve GPU compute clusters across many dimensions, such as resource usage fairness, GPU occupancy, GPU waste, application resilience, application performance and power usage.
Design and develop batch workload management and orchestration services
Provide support to staff and end users to resolve batch scheduler issues
Build and improve our ecosystem around GPU-accelerated computing
Performance analysis and optimizations of deep learning workflows
Develop large scale automation solutions
Root cause analysis and suggest corrective action for problems large and small scales
Finding and fixing problems before they occur
Requirements
Bachelor’s degree in Computer Science, Electrical Engineering or related field or equivalent experience
5+ years of work experience
Strong understanding of batch scheduling, preferably with experience in schedulers such as SLURM or K8s batch schedulers (Kueue, Volcano, etc.)
Significant experience in systems programming languages such as C/C++ & Go as well as scripting languages such as Python and bash
Established experience in Linux operating system, environment and tools
Experience analyzing and tuning performance for a variety of AI workloads
In-depth understanding of container technologies like Docker, Singularity, Podman
Flexibility/adaptability for working in a dynamic environment with different frameworks and requirements
Excellent communication, interpersonal and customer collaboration skills
Cloud Provisioning Engineer managing order provisioning processes and ensuring data integrity across cloud systems for Avaya. Collaborating with internal teams and optimizing workflows in the cloud.
Engineer participating in qualification activities related to equipment and product validation at Novo Nordisk's Chartres site. Collaborating with various stakeholders in an international group.
Resident Engineer supervising construction works ensuring compliance with quality standards and project schedules at Bureau Veritas. Involves site inspection, coordination, and reporting activities.
Performance Engineer optimizing GPU training for foundation models in Heidelberg. Join a team focused on improving efficiency and effectiveness in AI training systems.
NPI Engineer developing production plans and collaborating with cross - functional teams at Mobileye. Ensuring quality standards and managing the NPI process for automotive safety solutions.
Engineers enterprise intelligent automation and AI strategies for Conagra Brands. Leading development of AI assistants, mentoring teams, and ensuring solution quality.
HTM Clinical Engineer role at Beth Israel Lahey Health, contributing to patient care and well - being through skills and compassion. Work in a dedicated health care team.
CAD/BIM Engineer needed for planning and construction of large diameter pipelines for Anglian Water. Role involves health and safety integration, design collaboration, and technical compliance.
Engineer role focusing on innovative telecommunication solutions in networking at Tejas Networks. Collaborate within a fast - paced, autonomous environment with advanced technologies.
Senior Transmission Planning Engineer for innovative power systems studies at Xcel Energy. Leading modeling and analysis for transmission system planning and mentoring technical staff.