MLOps Engineer responsible for managing PyTorch-based training and inference workloads at Menlo HQ. Building and maintaining robust infrastructure for AI models and optimization processes.
Responsibilities
Own and evolve the infrastructure behind PyTorch-based training and inference workloads
Build and maintain training and inference pipelines using PyTorch
Own and evolve inference serving infrastructure
Write and maintain robust tooling in Python and C++
Optimize compute workloads for bare-metal environments
Troubleshoot low-level networking issues
Set up and manage ML environments
Establish CI/CD patterns for AI workloads
Integrate monitoring, alerting, and incident response
Requirements
Deep expertise in PyTorch internals
Strong programming skills in Python and C++
Solid computer science fundamentals
Hands-on experience with vLLM and SGLang
Experience with RLHF and PPO training pipelines
Strong understanding of distributed training setups
Experience debugging and tuning bare-metal Linux servers
Familiarity with job schedulers such as Airflow
Strong grasp of containerized and cloud-native environments
Senior Machine Learning Engineer leading ML model development for Adobe's Content Intelligence team. Collaborating with cross - functional teams to enhance creative content understanding using advanced AI.
AI/ML Engineer responsible for designing, building, and operating ML solutions in production. Collaborating with data teams to deliver measurable impact using advanced analytics.
Machine Learning Engineer developing advanced ML - driven applications to enhance quantum technologies. Collaborating with teams to translate complex physical data into actionable improvements.
Lead Machine Learning Engineer at Disney applying AI and machine learning to enhance advertising capabilities. Collaborating with teams to build robust ML systems and drive innovation.
Senior Machine Learning Scientist improving customer and business outcomes using ML and statistical modeling. Working with experienced team and involved in end - to - end model development.
Senior AI/ML Ops Engineer at Smartsheet responsible for building scalable AI/ML platforms. Collaborating with cross - functional teams to enhance data infrastructure and operational efficiency.
Machine Learning Engineer developing LLM - powered systems at Trainline. Designing predictive ML systems, collaborating with cross - functional teams on AI initiatives.
Staff ML Engineer building scalable platforms for ML model training and evaluation at GM. Collaborating on autonomous driving technology development and mentoring junior engineers.
Machine Learning Software Engineer developing and industrialising AI solutions for Tech Soft 3D's HOOPS AI product. Collaborating on core libraries and APIs for industrial 3D applications.