Technical Staff designing and optimizing distributed training systems for GPU clusters. Aiming to reduce convergence time through efficient coding and infrastructure optimization.
Responsibilities
Drive down wall-clock time to convergence by profiling and eliminating bottlenecks across the foundation model training stack stack, from data pipelines to GPU kernels
Design, build, and optimize distributed training systems (PyTorch) for multi-node GPU clusters, ensuring scalability, robustness, and high utilization
Implement efficient low-level code (CUDA, cuDNN, Triton, custom kernels) and integrate it seamlessly into high-level training frameworks
Optimize workloads for hardware efficiency: CPU/GPU compute balance, memory management, data throughput, and networking
Develop monitoring and debugging tools for large-scale runs, enabling rapid diagnosis of performance regressions and failures
Requirements
Deep experience in distributed systems, ML infrastructure, or high-performance computing (8+ years)
Production-grade expertise in Python
Low-level performance mastery: CUDA/cuDNN/Triton, CPU–GPU interactions, data movement, and kernel optimization
Scaling at the frontier: experience with PyTorch and training jobs using data, context, pipeline, and model parallelism
System-level mindset with a track record of tuning hardware–software interactions for maximum utilization
Member of Technical Staff focused on building LM/VLM - powered agents and generative simulation systems. Collaborating with internal teams and driving innovation in robotics and AI applications.
Member of Technical Staff responsible for data pipelines and infrastructure for robotics AI. Collaborating with a team to standardize and unify data processing workflows at scale.
Member of Technical Staff developing end - to - end Vision - Language - Action models for robotics. Collaborating with robotics teams to curate datasets and improve machine learning models.
Member of Technical Staff focused on building low - latency inference pipelines for robotics. Designing GPU inference systems and optimizing workloads for efficiency and performance.
Lead compiler development focusing on ML compilers for robotics simulation platform. Collaborate with engineers to enhance performance and support for differentiable programming.
Member of Technical Staff developing and optimizing robotic manipulation control systems. Collaborating with a team focused on building general - purpose Physical AI in Paris and London.
Member of Technical Staff developing GPU - based simulation pipelines for robotics. Collaborating on essential features to bridge the sim - to - real gap in robotics engineering.
Member of Technical Staff developing rendering systems for robotics foundation models in Paris. Collaborating with a team to build general - purpose Physical AI.
Microsoft 365 Solutions Developer designing and enhancing business solutions for a global law firm. Collaborating with teams to deliver secure and scalable solutions using Microsoft 365 services.