Performance Engineer optimizing GPU training for foundation models in Heidelberg. Join a team focused on improving efficiency and effectiveness in AI training systems.
Responsibilities
Engineer the systems required to train foundation models at scale.
Maximize hardware utilization and training throughput on our large-scale GPU clusters.
Work at the intersection of deep learning frameworks, distributed systems, and GPU microarchitecture.
Requirements
Are proficient in Python and the PyTorch library.
Have a strong engineering background in parallel and/or distributed systems with proven track record of excellence.
Have hands-on experience with modern machine learning techniques (especially large language models and their life cycle).
Deeply understand the CUDA programming model.
Have experience in distributed programming with APIs like NCCL or MPI.
Have experience analysing profiling traces with tools such as PyTorch Profiler and Nvidia Nsight.
Please note this role requires regular on-site collaboration in Heidelberg as a member of the Training Efficiency Team.
Benefits
30 days of paid vacation
Access to a variety of fitness & wellness offerings via Wellhub
Mental health support through nilo.health
JobRad® Bike Lease
Substantially subsidized company pension plan for your future security
Subsidized Germany-wide transportation ticket
Budget for additional technical equipment
Flexible working hours for better work-life balance and hybrid working model
Sales Engineer managing client engagements for Microsoft Dynamics 365, translating tech solutions for SME needs. Focused on digitally transforming businesses through tailored offerings.
Ingénieur études travaux nucléaires orchestrating nuclear construction projects as part of team at Antea Group. Ensuring compliance and collaboration with clients and stakeholders.
Géotechnicien étudiant des projets d’infrastructure chez Antea Group. Impliqué dans les études géotechniques et le suivi de travaux dans le Grand Ouest.
System Protection Engineer Intern working on transmission and distribution protection engineering projects at PG&E. Supporting project and maintenance activities with hands - on exposure under supervision.
Brake Test & Development Engineer developing North America brake systems through testing for Hyundai/KIA/Genesis vehicles. Collaborating with various departments and conducting benchmarking activities.
Chassis Dynamics Development Engineer II at Hyundai developing ESC systems and chassis dynamics control systems. Conducting tests and collaborating with engineers to enhance vehicle performance.
Chassis Control Calibration Engineer developing ESC systems through testing and collaboration with engineers at Hyundai Motor Company. Includes supporting vehicle performance and safety standards.
SCADA Engineer responsible for development and engineering of SCADA solutions for offshore wind projects. Collaborating on SCADA, network, and telecommunications concepts adhering to standards.
Graduate Engineer assisting site management from project start to handover at BAM Ireland based in Donegal. Engaging in civil engineering projects while collaborating with site teams.
Process Engineer responsible for defining, planning, executing, and validating manufacturing equipment launches. Collaborating cross - functionally to ensure machines are installed, commissioned, and integrated into production.