Hybrid Member of Technical Staff, Training

Posted 53 minutes ago

Apply now

About the role

  • Technical Staff designing and optimizing distributed training systems for GPU clusters. Aiming to reduce convergence time through efficient coding and infrastructure optimization.

Responsibilities

  • Drive down wall-clock time to convergence by profiling and eliminating bottlenecks across the foundation model training stack stack, from data pipelines to GPU kernels
  • Design, build, and optimize distributed training systems (PyTorch) for multi-node GPU clusters, ensuring scalability, robustness, and high utilization
  • Implement efficient low-level code (CUDA, cuDNN, Triton, custom kernels) and integrate it seamlessly into high-level training frameworks
  • Optimize workloads for hardware efficiency: CPU/GPU compute balance, memory management, data throughput, and networking
  • Develop monitoring and debugging tools for large-scale runs, enabling rapid diagnosis of performance regressions and failures

Requirements

  • Deep experience in distributed systems, ML infrastructure, or high-performance computing (8+ years)
  • Production-grade expertise in Python
  • Low-level performance mastery: CUDA/cuDNN/Triton, CPU–GPU interactions, data movement, and kernel optimization
  • Scaling at the frontier: experience with PyTorch and training jobs using data, context, pipeline, and model parallelism
  • System-level mindset with a track record of tuning hardware–software interactions for maximum utilization

Job title

Member of Technical Staff, Training

Job type

Experience level

Lead

Salary

Not specified

Degree requirement

No Education Requirement

Location requirements

Report this job

See something inaccurate? Let us know and we'll update the listing.

Report job