Hybrid Senior/Staff Software Engineer – ML Infrastructure

Posted 3 weeks ago

Apply now

About the role

  • Own data & labeling pipelines – architect scalable labeling services (storage, query, retrieval), design ontologies, automate annotation workflows, and build quality-tiered datasets that stay within cost constraints.
  • Build and operate training infrastructure – create multi-GPU / multi-node training frameworks (Ray, Spark, Kubernetes), optimize distributed jobs, and integrate accelerators (TensorRT, CUDA-graph, FP8, etc.).
  • Manage the full model lifecycle – stand up model registries, version control, evaluation suites, and continuous-learning loops that push updates from dev → staging → prod with zero-downtime rollbacks.
  • Provide technical leadership, mentorship, and lightweight project management to a small infra + research squad.
  • Establish DevOps-for-ML best practices (IaC, CI/CD, observability, cost monitoring) so researchers can iterate quickly and safely.
  • Partner with ML engineers on architecture decisions, from data schemas to inference optimizations, ensuring infra and research road-maps stay tightly aligned.

Requirements

  • Bachelor’s (or higher) in Computer Science, EE, or related field.
  • 5+ years building and operating large-scale infrastructure, with at least 3 years focused on ML or data-intensive systems.
  • Proven record designing highly available, distributed systems on Kubernetes (EKS, GKE, or on-prem).
  • Deep expertise with orchestration (K8s operators, Argo, Kubeflow), and cluster-scale storage / compute (S3, GCS, Ray, Spark, Dask).
  • Hands-on experience automating data-labeling or ground-truth workflows and maintaining dataset versioning.
  • Strong software-engineering fundamentals; familiar with best practices for testing, observability, and secure coding.
  • Demonstrated DevOps mindset — IaC (Terraform/CDK), CI/CD (GitHub Actions, ArgoCD), metrics & alerting (Prometheus/Grafana).

Benefits

  • Extensive / Generous health, dental, and vision insurance.
  • Highly competitive paid parental leave and support system.
  • Ownership in the business through an Equity Incentive Plan.
  • Generous paid time off and / or flexible work arrangements.
  • Daily meals in-office, vibrant company events, team-building.
  • 401K retirement plan, HSA options, pre-tax Commuter Card.

Job title

Senior/Staff Software Engineer – ML Infrastructure

Job type

Experience level

Senior

Salary

$200,000 - $250,000 per year

Degree requirement

Bachelor's Degree

Location requirements

Report this job

See something inaccurate? Let us know and we'll update the listing.

Report job