MLOps Engineer responsible for managing PyTorch-based training and inference workloads at Menlo HQ. Building and maintaining robust infrastructure for AI models and optimization processes.
Responsibilities
Own and evolve the infrastructure behind PyTorch-based training and inference workloads
Build and maintain training and inference pipelines using PyTorch
Own and evolve inference serving infrastructure
Write and maintain robust tooling in Python and C++
Optimize compute workloads for bare-metal environments
Troubleshoot low-level networking issues
Set up and manage ML environments
Establish CI/CD patterns for AI workloads
Integrate monitoring, alerting, and incident response
Requirements
Deep expertise in PyTorch internals
Strong programming skills in Python and C++
Solid computer science fundamentals
Hands-on experience with vLLM and SGLang
Experience with RLHF and PPO training pipelines
Strong understanding of distributed training setups
Experience debugging and tuning bare-metal Linux servers
Familiarity with job schedulers such as Airflow
Strong grasp of containerized and cloud-native environments
Join Visium as a Junior Machine Learning Engineer internship, focusing on applied research projects using Machine Learning to transform business operations in Switzerland.
AI Engineer at Trunk Tools revolutionizing construction with intelligent automation and production - ready AI agents. Leading design and implementation of multi - agent systems for document and data processing.
Audio Machine Learning Co - op developing real - time AI - powered audio processing algorithms for Bose. Collaborating with experts to prototype and implement novel ML algorithms for various applications.
AI Center of Excellence Engineer at F5 supporting applied AI research, prototyping, and engineering initiatives. Evaluating AI techniques and creating integration recommendations for production systems.
Senior ML Engineer at Centra developing forecasting and AI - driven decision support for fashion brands. Collaborating to enhance ecommerce through machine learning and insights.
Staff ML/AI Engineer for healthcare communication solutions at Accurx. Leading AI/ML initiatives to enhance patient communication and healthcare efficiency.
Senior Machine Learning Engineer developing ML systems for healthcare communication technology at Accurx. Join our mission - driven team to solve real - world problems in healthcare.
Senior Developer building and evolving ML/AI applications on AWS for Valorem Reply. Collaborating closely with product, architecture, and engineering teams for quality solutions.
Senior Developer at Valorem Reply delivering ML/AI applications on AWS. Collaborating with product and engineering teams to provide high - quality tech solutions.
Senior Software Engineer designing and operating ML infrastructure for Plaid's AI initiatives. Collaborating with product teams to accelerate AI - powered financial experiences and ensure scalable ML systems.