Hybrid AI Platform Systems Software Engineer

Posted 2 weeks ago

Apply now

About the role

  • Design and scale services to orchestrate AI/ML clusters across cloud and on-prem environments
  • Develop and optimize intelligent scheduling and resource management systems for heterogeneous compute clusters
  • Integrate Ray Train/Tune for large-scale distributed training workflows and Ray Serve for low-latency, autoscaled inference
  • Build features to improve reliability, performance, observability, and cost-efficiency of AI workloads at scale
  • Enhance the control plane to support secure multi-tenancy and enterprise-grade governance
  • Implement systems for container management, dependency resolution, and large-scale model distribution
  • Collaborate with ML researchers, applied scientists, and distributed systems engineers to drive platform innovation
  • Provide production support and work closely with field teams to resolve infrastructure issues

Requirements

  • Bachelor’s or Master’s degree in Computer Science, Engineering, or related field (or equivalent experience)
  • 8-10 years of experience building and maintaining infrastructure for highly available, scalable, and performant distributed systems
  • Proven expertise with cloud-native technologies (AWS, GCP, Azure) and Kubernetes-based deployments
  • Hands-on experience running ML training and inference with Ray (ray.io)
  • Deep understanding of networking, security, authentication, and identity management in distributed/cloud environments
  • Hands-on experience with observability stacks (Prometheus, Grafana, OpenTelemetry, etc.)
  • Strong coding skills in Go and/or Python; familiarity with other systems-level languages is a plus
  • Knowledge of Linux internals, containers, and storage systems
  • Experience optimizing for GPU/accelerator integration (NVIDIA, AMD, TPU, etc.) is highly desirable

Benefits

  • Full range of medical benefits
  • Financial benefits
  • Various paid time off benefits, such as PTO and parental leave

Job title

AI Platform Systems Software Engineer

Job type

Experience level

SeniorLead

Salary

$132,000 - $222,100 per year

Degree requirement

Bachelor's Degree

Location requirements

Report this job

See something inaccurate? Let us know and we'll update the listing.

Report job