Performance Engineer optimizing GPU training for foundation models in Heidelberg. Join a team focused on improving efficiency and effectiveness in AI training systems.
Responsibilities
Engineer the systems required to train foundation models at scale.
Maximize hardware utilization and training throughput on our large-scale GPU clusters.
Work at the intersection of deep learning frameworks, distributed systems, and GPU microarchitecture.
Requirements
Are proficient in Python and the PyTorch library.
Have a strong engineering background in parallel and/or distributed systems with proven track record of excellence.
Have hands-on experience with modern machine learning techniques (especially large language models and their life cycle).
Deeply understand the CUDA programming model.
Have experience in distributed programming with APIs like NCCL or MPI.
Have experience analysing profiling traces with tools such as PyTorch Profiler and Nvidia Nsight.
Please note this role requires regular on-site collaboration in Heidelberg as a member of the Training Efficiency Team.
Benefits
30 days of paid vacation
Access to a variety of fitness & wellness offerings via Wellhub
Mental health support through nilo.health
JobRad® Bike Lease
Substantially subsidized company pension plan for your future security
Subsidized Germany-wide transportation ticket
Budget for additional technical equipment
Flexible working hours for better work-life balance and hybrid working model
OT/ICS Cybersecurity Engineer at ST Engineering securing operational technology systems. Collaborating on complex cyber security solutions and addressing security risks to critical infrastructure.
Process Engineer at Nemak focusing on machining and mechanical joining for automotive lightweight solutions. Responsible for optimizing processes and ensuring quality standards.
Sustainability & Energy Management Specialist overseeing energy management strategy and initiatives at Nemak. Focused on optimizing energy consumption and reducing carbon footprint across plants.
Forward Deployed Engineer deploying and supporting software solutions for H2 Analytics. Ensuring seamless integration and performance in complex client environments, primarily on - premises.
Sr. Flight Controls Engineer part of the GNC team designing flight control laws for automated aviation systems. Collaborating with engineers to enhance safety in air transportation.
Developing physics - based models for guidance, navigation, and control of aircraft at Reliable Robotics. The role involves simulation analysis and real - world data correlation to ensure safety - enhancing technology.
Additive Manufacturing Engineer with expertise in LW - DED technologies and robotic systems at Pangea Propulsion. Responsible for overseeing the entire fabrication process and collaborating with quality and design teams.
Senior Geotechnical Engineer at Langan designing geotechnical structures and leading investigations for development projects. Collaborating with teams on technical reports and managing fieldwork for construction inspections.
Systems engineer involved in the integration and certification of hydro - mechanical systems at SOGECLAIR. Collaborating across teams and ensuring design quality through various development phases.
In - House Engineer improving customer relations and coordinating technical solutions in the mining industry. Supporting maintenance plans and identifying opportunities for equipment replacement.