About the role

  • Staff Research Engineer applying and optimizing AI/ML models to solve biomedical problems at Chan Zuckerberg Initiative. Collaborating with teams to develop and deploy ML models for research and insights.

Responsibilities

  • Define and execute the long-term vision and roadmap for AI, data, cloud, and security infrastructure, with clear metrics to measure progress and outcomes.
  • Oversee the design and operation of hybrid GPU compute clusters and ML platforms to support training, fine-tuning, and inference workloads.
  • Ensure robust, scalable, and efficient data infrastructure and cloud operations to power analytics, ML pipelines, and product needs.
  • Drive reliability, observability, and cost optimization across GPU based workloads for development, training and inference.
  • Implement modern AI/ML Ops practices (orchestration of model training workloads, reproducibility, automated monitoring) to accelerate research and production workflows, with a focus on continuous delivery and improvement.
  • Build, mentor, and scale high-performing, multi-disciplinary engineering teams.
  • Partner with product, research, and executive leadership to align infrastructure with organizational priorities, ensuring delivery is measured against agreed objectives and key results.
  • Establish policies for infrastructure usage, prioritization, and compliance with regulatory requirements.
  • Stay ahead of emerging technologies in AI infrastructure, cloud, and security; drive their strategic adoption.

Requirements

  • 15+ years in engineering, with at least 7+ years in senior leadership roles managing multi-disciplinary teams and organizations of 30+ employees, with experience leading and developing managers.
  • Strong knowledge of AI/ML frameworks (e.g., PyTorch) and MLOps tools (e.g., Kubeflow, MLflow, Ray).
  • Experience managing both traditional cloud platforms (AWS, GCP, Azure) and AI cloud (HPC/GPU clusters).
  • Deep experience with large-scale data systems, pipelines, and storage technologies.
  • Track record of improving reliability, observability, and cost efficiency in large-scale systems.
  • Proven ability to define multi-year infrastructure strategies while delivering on immediate priorities.
  • Exceptional written and verbal communication skills, capable of engaging technical and non-technical audiences.
  • Ability to provide clear leadership and momentum in an ambiguous environment—setting direction, aligning teams, and turning uncertainty into forward progress.

Benefits

  • CZI provides a generous employer match on employee 401(k) contributions to support planning for the future.
  • Annual benefit for employees that can be used most meaningfully for them and their families, such as housing, student loan repayment, childcare, commuter costs, or other life needs.
  • CZI Life of Service Gifts are awarded to employees to “live the mission” and support the causes closest to them.
  • Paid time off to volunteer at an organization of your choice.
  • Funding for select family-forming benefits.
  • Relocation support for employees who need assistance moving to the Bay Area.
  • And more!

Job title

Staff Research Engineer, AI Engineering

Job type

Experience level

Lead

Salary

$435,000 - $621,500 per year

Degree requirement

Postgraduate Degree

Location requirements

Report this job

See something inaccurate? Let us know and we'll update the listing.

Report job