Senior Director of AI Infrastructure, Engineering at Chan Zuckerberg Initiative | Hybrid Hired

About the role

Working with the AI Research Scientists, iterate on, optimize, deploy, and maintain innovative machine learning models, systems, and software tools that enable the analysis and interpretation of AI models for Biology.
Work with cross-functional team members to quickly iterate on system performance to meet/stay ahead of users’ needs - e.g. we get feedback that the model doesn't scale to X million so working with our user researcher/scientist/product team to iterate on the solution.
Partner with research scientists to build robust data loader pipelines for scalable distributed training and evaluation.
Serve as an interface to product and engineering teams to understand how models may need to evolve to support multiple use cases.
Develop model evaluation and interpretability frameworks that help biologists understand which data features drive model predictions.
Build reusable engineering utilities that can unlock experimentation velocity across research initiatives in the organization.
Optimize model architectures to enhance performance, fine-tune accuracy, and efficiently manage infrastructure resources.

Requirements

Experience in working with a highly interactive and cross-functional collaborative environment with a diverse team of colleagues and partners solving complex problems through applied deep learning.
A track record and expertise in developing deep learning models on large-scale GPU clusters, using techniques of distributing training such as DDP, FSDP, Model parallelism, low-precision training, profiling and optimizing AI/ML code, fine tuning models.
Expertise in leading end-to-end experimentation pipelines for training and evaluating deep learning models, with particular focus on experiment tracking and reproducibility.
A good working knowledge of Python-based ML libraries and frameworks such as PyTorch, JAX, TensorFlow, NumPy, Pandas, and Scikit-learn.
Experience in using modern frameworks for distributed computing and infrastructure management, particularly as related to ML models such as PyTorch Lightning, Deepspeed, TransformerEngine, RayScale etc.
Ability to effectively balance exploratory research with robust engineering practices.
A good working knowledge of general software engineering practices in a production environment.
The ability to work independently and as part of a team, and have excellent communication and interpersonal skills.
Have a Masters in computer science with a focus on machine learning & data analytics, or equivalent industry experience and at least 6-8 years of experience developing and applying machine learning methods.

Benefits

CZI provides a generous employer match on employee 401(k) contributions to support planning for the future.
Annual benefit for employees that can be used most meaningfully for them and their families, such as housing, student loan repayment, childcare, commuter costs, or other life needs.
CZI Life of Service Gifts are awarded to employees to “live the mission” and support the causes closest to them.
Paid time off to volunteer at an organization of your choice.
Funding for select family-forming benefits.
Relocation support for employees who need assistance moving to the Bay Area.
And more!

Similar roles

Browse all Llm Engineer jobs

2 days ago

KU

KPMG UKSenior Manager – Lead Generative AI Engineer

Lead Generative AI Engineer specializing in generative AI model development and AI transformation for KPMG clients. Drive innovation and address complex business challenges with AI initiatives.

Onsite Role

London United Kingdom Llm Engineer

2 weeks ago

ER

ERMConsulting Partner – AI Infrastructure, Data Centers

Consulting Partner focusing on AI infrastructure and data centers for ERM in North America. Delivering strategic consulting services and developing new client relationships.

Hybrid Role

Houston United States Llm Engineer

3 weeks ago

TA

TavusAI Researcher – Large Language Models

AI Researcher at Tavus advancing large language modeling for Conversational Avatars. Conducting cutting - edge research to bridge human and machine interactions with advanced AI technology.

Hybrid Role

San Francisco United States Llm Engineer

3 weeks ago

PW

PwCLLM Engineer – Manager

Manager in Privacy Law and Data Protection at PwC specializing in language models. Overseeing project execution and mentoring team members for innovation in applications.

Hybrid Role

New York City United States Llm Engineer

$99,000 - $232,000 per year

3 weeks ago

MI

MECA GROUP, INC.Technical Product Manager – AI Infrastructure, Platform

Technical Product Manager leading the evolution of AI infrastructures at SingleFile. Building systems for compliance solutions with a focus on data security, scalability, and reliability.

Onsite Role

Seattle United States Llm Engineer

$95,000 - $125,000 per year

3 weeks ago

PP

Primary Venture PartnersOperator-In-Residence, AI Infrastructure, Software

Operator - In - Residence focusing on software opportunities within AI Infrastructure at Primary Labs. Supporting idea generation and commercialization in a high - growth AI ecosystem.

Hybrid Role

San Francisco United States Llm Engineer

$125,000 - $180,000 per year

last month

DA

dentsu AustriaSenior LLM Developer

Mid - Level LLM Application Developer focused on intelligent applications using Azure OpenAI. Collaborating with teams to develop scalable AI - powered solutions and chatbots in Pune and Gurugram.

Onsite Role

Pune India Llm Engineer

last month

NO

NooksSoftware Engineer, Voice AI Infrastructure

Software Engineer developing voice AI infrastructure for Nooks, an AI Sales Assistant Platform. Focused on real - time audio/video calling systems and improving call quality metrics.

Hybrid Role

San Francisco United States Llm Engineer

last month

DG

Deep GenomicsSenior Research Scientist – Large Language Models for Genomics

Senior Research Scientist specializing in large language models for genomics at Deep Genomics. Join us in revolutionizing drug discovery using AI and innovative therapeutic design.

Hybrid Role

Toronto Canada Llm Engineer

last month

VG

Volkswagen GroupInternship / Master Thesis – Multi-modal Large Language Models

Master thesis/internship developing multi - modal LLM methods to improve perception and planning for CARIAD's autonomous driving. Collaboration with Model Engineering and PhD researchers.

Hybrid Role

Berlin Germany Llm Engineer