Senior ML Platform Engineer building and scaling machine learning infrastructure for AI applications. Responsible for LLM deployment, Kubernetes management, and mentoring engineering teams.
Responsibilities
Build and scale machine learning infrastructure focused on Large Language Models (LLMs) and AI applications
Design and implement scalable infrastructure for training, fine-tuning, and serving open source LLMs
Architect and manage Kubernetes clusters for ML workloads
Ensure 99.9%+ uptime for ML platforms through robust monitoring
Mentor junior engineers and data scientists on platform best practices
Collaborate with data scientists and product engineering teams
Present technical solutions and platform roadmaps to leadership
Requirements
Bachelor’s degree in computer science, Engineering, or related technical field (or equivalent experience)
5+ years of software engineering experience with focus on infrastructure, platform engineering, or MLOps
2+ years of hands-on experience with machine learning infrastructure and deployment at scale
1+ years of experience working with Large Language Models and transformer architectures
Proficient in Python; strong skills in Go, Rust, or Java preferred
Proven experience working with open source LLMs (Llama 2/3, Qwen, Mistral, Gemma, Code Llama, etc.)
Proficient in Kubernetes including custom operators, helm charts, and GPU scheduling
Deep expertise in Azure services (AKS, Azure ML, Container Registry, Storage, Networking)
Benefits
Comprehensive Total Rewards program that offers personalized coverage
Health insurance
401(k) savings plan vested from day one that offers a 6% match
Performance and recognition-based incentives
Tuition assistance
Workplace flexibility as well as GEICO Flex program allowing work from anywhere in the US for up to four weeks per year
Machine Learning Engineer designing and implementing AI systems focused on Japanese language challenges at Woven by Toyota. Involves technical R&D, system design, and collaboration with cross - functional teams.
Principal Software Engineer leading MLOps within Analytics Platform at Sun Life. Focused on AWS and machine learning operations, collaborating across technical and business teams.
Machine Learning Engineer designing and optimizing deep learning models for safety - critical environments at Destinus. Shaping the future of high - speed, autonomous flight technologies.
Machine Learning Engineer optimizing personalization systems for Spotify's audio streaming service. Collaborating with cross - functional teams to enhance user experience and deliver recommendations.
Principal Machine Learning Engineer developing ML and GenAI solutions in a cloud - native environment at Flexera. Leading a high - impact team and driving operational excellence for ML infrastructure.
Senior ML Platform/Ops Engineer building AI - powered ML pipelines for a dynamic Ed - Tech company. Collaborating with ML scientists and engineers to ensure reliable deployment and observability.
Senior ML Platform/Ops Engineer building ML systems for AI - powered learning at Preply. Productionizing machine learning with high reliability, performance, and observability in a hybrid environment.
Machine Learning Engineer developing advanced Deep Learning models for autonomous driving technology at Mobileye. Collaborating in a high - end algorithmic engineering team on critical computer vision challenges.
Machine Learning Engineer focusing on vulnerabilities and security of AI systems at Carnegie Mellon University. Collaborating with a team to build robust prototypes and provide solutions for government sponsors.