Senior Director of AI Infrastructure & Engineering at CZI driving AI initiatives for biomedical sciences. Leading the AI/ML Infrastructure team optimizing resources for impactful research.
Responsibilities
Working with the AI Research Scientists, iterate on, optimize, deploy, and maintain innovative machine learning models, systems, and software tools that enable the analysis and interpretation of AI models for Biology.
Work with cross-functional team members to quickly iterate on system performance to meet/stay ahead of users’ needs - e.g. we get feedback that the model doesn't scale to X million so working with our user researcher/scientist/product team to iterate on the solution.
Partner with research scientists to build robust data loader pipelines for scalable distributed training and evaluation.
Serve as an interface to product and engineering teams to understand how models may need to evolve to support multiple use cases.
Develop model evaluation and interpretability frameworks that help biologists understand which data features drive model predictions.
Build reusable engineering utilities that can unlock experimentation velocity across research initiatives in the organization.
Optimize model architectures to enhance performance, fine-tune accuracy, and efficiently manage infrastructure resources.
Requirements
Experience in working with a highly interactive and cross-functional collaborative environment with a diverse team of colleagues and partners solving complex problems through applied deep learning.
A track record and expertise in developing deep learning models on large-scale GPU clusters, using techniques of distributing training such as DDP, FSDP, Model parallelism, low-precision training, profiling and optimizing AI/ML code, fine tuning models.
Expertise in leading end-to-end experimentation pipelines for training and evaluating deep learning models, with particular focus on experiment tracking and reproducibility.
A good working knowledge of Python-based ML libraries and frameworks such as PyTorch, JAX, TensorFlow, NumPy, Pandas, and Scikit-learn.
Experience in using modern frameworks for distributed computing and infrastructure management, particularly as related to ML models such as PyTorch Lightning, Deepspeed, TransformerEngine, RayScale etc.
Ability to effectively balance exploratory research with robust engineering practices.
A good working knowledge of general software engineering practices in a production environment.
The ability to work independently and as part of a team, and have excellent communication and interpersonal skills.
Have a Masters in computer science with a focus on machine learning & data analytics, or equivalent industry experience and at least 6-8 years of experience developing and applying machine learning methods.
Benefits
CZI provides a generous employer match on employee 401(k) contributions to support planning for the future.
Annual benefit for employees that can be used most meaningfully for them and their families, such as housing, student loan repayment, childcare, commuter costs, or other life needs.
CZI Life of Service Gifts are awarded to employees to “live the mission” and support the causes closest to them.
Paid time off to volunteer at an organization of your choice.
Funding for select family-forming benefits.
Relocation support for employees who need assistance moving to the Bay Area.
Vice President of Product Management leading the US server and AI infrastructure business at ASUS. Directing product strategy, governance, and engaging with strategic partners.
AI/LLM Engineer developing applications leveraging LLMs for Pulsora's sustainability platform. Seeking specialist in AI frameworks, prompt engineering, and full - stack development in hybrid environment.
AI Infra Engineer managing Kubernetes clusters and Slurm HPC environments for AI training and inference workloads. Collaborating closely with research teams to optimize performance and improve systems.
Principal AI/ML Engineer developing AI/ML algorithms and leading a multidisciplined team at CACI. Focusing on large language models and applications for defense and commercial use.
Generative AI Engineer focusing on agent systems and robust backend development. Utilize Python, FastAPI, and Google Cloud for advanced AI applications and services.
Principal Generative AI Engineer leading innovative AI solutions for global projects in a consultancy. Collaborating with teams to drive generative AI initiatives and technical direction in various sectors.
Senior/Lead Gen AI/LLM Engineer working with cross - disciplinary teams to prototype AI components for city services. Responsible for coaching and guiding city teams on AI prototypes and solutions.
LLM Engineer solving real problems using LLMs and AI for various industries and products. Involves development, optimization, and team collaboration in project implementations.
Senior Generative AI Engineer responsible for developing AI demonstrators and leading projects at Alexander Thamm GmbH. Engaging in research, training, and supporting marketing efforts in Germany.
Principal Generative AI Engineer responsible for designing data architectures and Cloud solutions in Switzerland. Involves client advisory on AI strategies and participation at community events.