Staff Research Engineer applying and optimizing AI/ML models to solve biomedical problems at Chan Zuckerberg Initiative. Collaborating with teams to develop and deploy ML models for research and insights.
Responsibilities
Define and execute the long-term vision and roadmap for AI, data, cloud, and security infrastructure, with clear metrics to measure progress and outcomes.
Oversee the design and operation of hybrid GPU compute clusters and ML platforms to support training, fine-tuning, and inference workloads.
Ensure robust, scalable, and efficient data infrastructure and cloud operations to power analytics, ML pipelines, and product needs.
Drive reliability, observability, and cost optimization across GPU based workloads for development, training and inference.
Implement modern AI/ML Ops practices (orchestration of model training workloads, reproducibility, automated monitoring) to accelerate research and production workflows, with a focus on continuous delivery and improvement.
Build, mentor, and scale high-performing, multi-disciplinary engineering teams.
Partner with product, research, and executive leadership to align infrastructure with organizational priorities, ensuring delivery is measured against agreed objectives and key results.
Establish policies for infrastructure usage, prioritization, and compliance with regulatory requirements.
Stay ahead of emerging technologies in AI infrastructure, cloud, and security; drive their strategic adoption.
Requirements
15+ years in engineering, with at least 7+ years in senior leadership roles managing multi-disciplinary teams and organizations of 30+ employees, with experience leading and developing managers.
Strong knowledge of AI/ML frameworks (e.g., PyTorch) and MLOps tools (e.g., Kubeflow, MLflow, Ray).
Experience managing both traditional cloud platforms (AWS, GCP, Azure) and AI cloud (HPC/GPU clusters).
Deep experience with large-scale data systems, pipelines, and storage technologies.
Track record of improving reliability, observability, and cost efficiency in large-scale systems.
Proven ability to define multi-year infrastructure strategies while delivering on immediate priorities.
Exceptional written and verbal communication skills, capable of engaging technical and non-technical audiences.
Ability to provide clear leadership and momentum in an ambiguous environment—setting direction, aligning teams, and turning uncertainty into forward progress.
Benefits
CZI provides a generous employer match on employee 401(k) contributions to support planning for the future.
Annual benefit for employees that can be used most meaningfully for them and their families, such as housing, student loan repayment, childcare, commuter costs, or other life needs.
CZI Life of Service Gifts are awarded to employees to “live the mission” and support the causes closest to them.
Paid time off to volunteer at an organization of your choice.
Funding for select family-forming benefits.
Relocation support for employees who need assistance moving to the Bay Area.
Senior Platform AI Engineer focused on AI infrastructure and frameworks at Nexxen in Tel Aviv. Building next - gen AI capabilities for diverse products.
AI Engineer developing Agentic AI applications for Ford's Customer Service Division. Leading intelligent system development to improve service advisor efficiency with complex data synthesis.
Senior AI Engineer responsible for data preparation in foundation model pre - training for various German - speaking industries. Collaborating on data quality and processing to enhance model capabilities.
AI Architect at Intelligen designing scalable AI and data solutions using Microsoft Fabric for APAC clients. Leading architecture for enterprise AI use cases and platform modernization initiatives.
(Senior) AI Engineer - Consultant helping clients implement AI solutions and integrate them into existing processes. Engaging in workshops to identify potential and analyze data for seamless integration.
Founding AI Engineer position at Grand focused on building and managing AI systems and ensuring their safe operations. Collaborate closely with founders and other teams in a hybrid work environment.
AI Engineer building real - world AI systems for Calibrax AI. Delivering end - to - end AI solutions, including model development and infrastructure management.
AI Engineer designing AI - powered document processing systems for real estate transactions at myAbode. Leading architecture strategy and mentoring engineers in a fast - paced environment.
AI Engineer at Ledge developing AI - driven solutions for automated financial processes. Responsible for AI system ownership, evolution, and collaboration with finance teams.
Senior Prompt Engineer designing AI systems and LLM pipelines to integrate into products at HyperFi. Collaborating with engineers to construct RAG systems and improve workflows.