Staff Research Engineer applying and optimizing AI/ML models to solve biomedical problems at Chan Zuckerberg Initiative. Collaborating with teams to develop and deploy ML models for research and insights.
Responsibilities
Define and execute the long-term vision and roadmap for AI, data, cloud, and security infrastructure, with clear metrics to measure progress and outcomes.
Oversee the design and operation of hybrid GPU compute clusters and ML platforms to support training, fine-tuning, and inference workloads.
Ensure robust, scalable, and efficient data infrastructure and cloud operations to power analytics, ML pipelines, and product needs.
Drive reliability, observability, and cost optimization across GPU based workloads for development, training and inference.
Implement modern AI/ML Ops practices (orchestration of model training workloads, reproducibility, automated monitoring) to accelerate research and production workflows, with a focus on continuous delivery and improvement.
Build, mentor, and scale high-performing, multi-disciplinary engineering teams.
Partner with product, research, and executive leadership to align infrastructure with organizational priorities, ensuring delivery is measured against agreed objectives and key results.
Establish policies for infrastructure usage, prioritization, and compliance with regulatory requirements.
Stay ahead of emerging technologies in AI infrastructure, cloud, and security; drive their strategic adoption.
Requirements
15+ years in engineering, with at least 7+ years in senior leadership roles managing multi-disciplinary teams and organizations of 30+ employees, with experience leading and developing managers.
Strong knowledge of AI/ML frameworks (e.g., PyTorch) and MLOps tools (e.g., Kubeflow, MLflow, Ray).
Experience managing both traditional cloud platforms (AWS, GCP, Azure) and AI cloud (HPC/GPU clusters).
Deep experience with large-scale data systems, pipelines, and storage technologies.
Track record of improving reliability, observability, and cost efficiency in large-scale systems.
Proven ability to define multi-year infrastructure strategies while delivering on immediate priorities.
Exceptional written and verbal communication skills, capable of engaging technical and non-technical audiences.
Ability to provide clear leadership and momentum in an ambiguous environment—setting direction, aligning teams, and turning uncertainty into forward progress.
Benefits
CZI provides a generous employer match on employee 401(k) contributions to support planning for the future.
Annual benefit for employees that can be used most meaningfully for them and their families, such as housing, student loan repayment, childcare, commuter costs, or other life needs.
CZI Life of Service Gifts are awarded to employees to “live the mission” and support the causes closest to them.
Paid time off to volunteer at an organization of your choice.
Funding for select family-forming benefits.
Relocation support for employees who need assistance moving to the Bay Area.
AI Engineer developing intelligent data pipelines and multi - agent workflows in Vietnam. Collaborating with the Singapore hub to deliver impactful stakeholder intelligence results.
AI Engineer developing innovative AI solutions for stakeholder intelligence at TSC. Designing data pipelines and optimizing ML models in a collaborative environment.
AI Platform/ Model Developer at Mars leveraging AI to enhance North America Supply Chain efficiency and resilience. Collaborating with various teams to design and implement scalable AI capabilities.
Principal AI Engineer leading platform engineering and AI enablement initiatives at Humana, driving strategy for AI tools and products while collaborating with cross - functional partners.
AI Engineer at Worldia creating and deploying AI workflows for travel agencies. Collaborating cross - functionally with product, data, and engineering teams to solve concrete problems.
Design and build an agentic AI platform to manage renewable energy certificates and automate related workflows. Collaborate across teams to integrate the platform, optimize user onboarding, and iterate based on feedback and metrics.
AI Engineer - Consultant developing and integrating AI solutions for clients. Working in a hybrid model to support their journey towards data - driven processes.
Senior AI Engineer building agentic workflows and production - grade systems for an AI - powered platform. Join Altura to innovate and drive impactful AI initiatives in a startup environment.
Applied AI Engineer at WorkOS designing and shipping production AI systems. Collaborating with teams across the company to improve efficiency and workflows using AI.
Staff AI Engineer developing first AI Engineering Co - Pilot for Black Semiconductor's process and device engineering. Utilizing complex datasets to produce insights and predictive models for improved processes.