Design, build and optimise ML pipelines and production systems that train, evaluate and serve recommendation models efficiently and at scale.
Work in a cross-functional team alongside data scientists, machine learning scientists, software engineers and both technical and non-technical stakeholders.
Partner with ML Scientists to translate research models into efficient, maintainable, and well-tested production systems.
Implement monitoring, observability, and retraining strategies to ensure continuous model performance in a dynamic, global environment.
Contribute to the evolution of our ML infrastructure, including CI/CD, model registries, and feature stores.
Diagnose and resolve production ML issues, such as data inconsistencies and model drift, to identify and resolve infrastructure bottlenecks.
Champion engineering best practices for scalability, reliability, and reproducibility across the ML lifecycle.
Requirements
2+ years of relevant industry experience.
An advanced degree in Computer Science, Mathematics or a similar quantitative discipline.
Strong software engineering background. You write clean, scalable, and maintainable code in Python or similar languages.
Proven experience deploying and operating ML systems in production environments.
Deep understanding of MLOps and infrastructure concepts: CI/CD for ML, feature stores, model serving, observability, and versioning.
Experience with modern ML frameworks (e.g. PyTorch, TensorFlow) and orchestration tools (e.g. Airflow, Kubeflow, SageMaker, Ray).
Familiarity with containerisation and cloud-native environments (e.g. Docker, Kubernetes, GCP/AWS).
Skilled at debugging complex, distributed ML systems and optimising for performance at scale.
Excellent communicator and collaborator. You communicate effectively with scientists, engineers, and non-technical stakeholders.
Interested in contributing to the responsible development of ML and AI, with a focus on building systems that are fair, equitable and accountable.
Benefits
Medical
Dental
Vision
401(k) match
Unlimited Paid Time Off Policy
Maven Fertility: $10,000 lifetime benefit for fertility, adoption, abortion care, and more.
26 Weeks Parental Leave: For both primary and secondary caregivers.
Family & Compassionate Leave: Inclusive of domestic violence recovery.
Company-wide Week Off: Annual collective rest for the entire company.
Focus Fridays: No meetings, emails, or deadlines—just deep work.
Senior Software Engineer developing machine learning geospatial products for Planet. Collaborating with engineers and scientists on innovative remote sensing analytics.
Machine Learning Engineer responsible for optimizing AI pipelines at Easy2Parts. Join a growing team to revolutionize component sourcing with AI technology.
AI/ML Engineer developing and deploying machine learning solutions for Nokia's network optimization projects. Collaborating with cross - functional teams to enhance network planning capabilities.
Machine Learning Platform Engineer for Coinbase, building foundational components for ML at scale. Collaborating on fraud combat, personalizing user experiences, and blockchain analysis.
Machine Learning Engineer focused on building sophisticated models to protect Coinbase users from fraud. Engaging in hands - on technical role with modern AI/ML methodologies.
Senior ML Platform Engineer developing and maintaining scalable ML infrastructure at GEICO. Focused on Large Language Models and collaborating with data science and engineering teams.
Staff ML Engineer developing GenAI infrastructure at Zendesk. Leading design and optimization of ML platforms while fostering technical excellence and collaboration.
Senior Deep Learning Engineer developing deep learning models for wireless communications. Working on next - gen signal processing and radio access technologies at NVIDIA's Vietnam R&D center.
Leading a team of ML Engineers to design and deploy AI - driven solutions at Welldoc. Overseeing critical ML projects while collaborating with international teams.
Senior ML Platform Engineer building and scaling machine learning infrastructure for AI applications. Responsible for LLM deployment, Kubernetes management, and mentoring engineering teams.