Hybrid Principal ML Ops Engineer

Posted 35 minutes ago

Apply now

About the role

  • Lead ML Ops Engineer for a fast-growing AI startup focused on scalable infrastructure. Drive hands-on execution across the entire model lifecycle in a collaborative environment.

Responsibilities

  • Architect, build, and scale the end-to-end ML Ops pipeline, including training, fine-tuning, evaluation, rollout, and monitoring.
  • Design reliable infrastructure for model deployment, versioning, reproducibility, and orchestration across cloud and on-prem GPU clusters.
  • Optimize compute usage across distributed systems (Kubernetes, autoscaling, caching, GPU allocation, checkpointing workflows).
  • Lead the implementation of observability for ML systems (monitor drift, performance, throughput, reliability, cost).
  • Build automated workflows for dataset curation, labeling, feature pipelines, evaluation, and CI/CD for ML models.
  • Collaborate with researchers to productionize models and accelerate training/inference pipelines.
  • Establish ML Ops best practices, internal standards, and cross-team tooling.
  • Mentor engineers and influence architectural direction across the entire AI platform.

Requirements

  • Deep hands-on experience designing and operating production ML systems at scale (Staff/Principal-level expected).
  • Strong background in ML Ops, distributed systems, and cloud infrastructure (AWS, GCP, or Azure).
  • Proficiency with Python and familiarity with TypeScript or Go for platform integration.
  • Expertise in ML frameworks: PyTorch, Transformers, vLLM, Llama-factory, Megatron-LM, CUDA / GPU acceleration (practical understanding)
  • Strong experience with containerization and orchestration (Docker, Kubernetes, Helm, autoscaling).
  • Deep understanding of ML lifecycle workflows: training, fine-tuning, evaluation, inference, model registries.
  • Ability to lead technical strategy, collaborate cross-functionally, and operate in fast-paced environments

Benefits

  • Competitive salary & equity options
  • Sign-on bonus
  • Health, Dental, and Vision
  • 401k

Job title

Principal ML Ops Engineer

Job type

Experience level

Lead

Salary

Not specified

Degree requirement

No Education Requirement

Location requirements

Report this job

See something inaccurate? Let us know and we'll update the listing.

Report job