MLOps Engineer building and operating scalable ML training & serving infrastructure for Epidemic Sound’s music search, recommendation, and audio ML systems.
Responsibilities
Design, build, and maintain the core infrastructure that powers machine learning applications.
Streamline the entire ML lifecycle and implement next-generation technologies.
Build scalable infrastructure for training and serving machine learning models using Kubernetes (GKE).
Develop and optimize CI/CD pipelines to streamline ML application lifecycle from development to production.
Implement and manage robust ML monitoring and observability solutions to ensure production model reliability.
Collaborate with Machine Learning Engineers, Data Engineers, and product teams to integrate data pipelines and tools like Vertex AI and feature stores.
Work within a team of MLOps engineers inside a larger cross-functional group.
Requirements
Proven experience in MLOps, with a deep understanding of best practices like ML monitoring and CI/CD for machine learning.
Proficiency with Kubernetes in a production environment.
Hands-on experience with pipeline orchestration tools such as Vertex AI Pipelines, Kubeflow Pipelines, Flyte, or Metaflow.
Infrastructure as Code skills, particularly with Terraform.
Experience with cloud-native data processing services like Dataflow or Airflow.
Nice to have: Experience with Google Cloud Platform services like BigQuery and Google Cloud Storage.
Nice to have: Knowledge of advanced data engineering practices.
Nice to have: Familiarity with observability tools for production infrastructure (e.g., Grafana, Prometheus, OpenTelemetry).
Nice to have: Experience with serverless inference frameworks such as Seldon Core.
Nice to have: Familiarity with Music Information Retrieval.
Machine Learning Engineer designing and implementing AI systems focused on Japanese language challenges at Woven by Toyota. Involves technical R&D, system design, and collaboration with cross - functional teams.
Principal Software Engineer leading MLOps within Analytics Platform at Sun Life. Focused on AWS and machine learning operations, collaborating across technical and business teams.
Machine Learning Engineer designing and optimizing deep learning models for safety - critical environments at Destinus. Shaping the future of high - speed, autonomous flight technologies.
Machine Learning Engineer optimizing personalization systems for Spotify's audio streaming service. Collaborating with cross - functional teams to enhance user experience and deliver recommendations.
Principal Machine Learning Engineer developing ML and GenAI solutions in a cloud - native environment at Flexera. Leading a high - impact team and driving operational excellence for ML infrastructure.
Senior ML Platform/Ops Engineer building AI - powered ML pipelines for a dynamic Ed - Tech company. Collaborating with ML scientists and engineers to ensure reliable deployment and observability.
Senior ML Platform/Ops Engineer building ML systems for AI - powered learning at Preply. Productionizing machine learning with high reliability, performance, and observability in a hybrid environment.
Machine Learning Engineer developing advanced Deep Learning models for autonomous driving technology at Mobileye. Collaborating in a high - end algorithmic engineering team on critical computer vision challenges.
Machine Learning Engineer focusing on vulnerabilities and security of AI systems at Carnegie Mellon University. Collaborating with a team to build robust prototypes and provide solutions for government sponsors.