Manager of Machine Learning Operations at Zefr, leading a team to build scalable ML infrastructure and optimize ML model performance. Collaborate closely with ML Engineers and Data Scientists for robust pipeline management.
Responsibilities
Lead, mentor, and grow a team of Machine Learning Engineers, fostering a culture of innovation and continuous improvement
Design and implement scalable ML infrastructure for model training, deployment, and serving
Establish and enforce best practices for ML model lifecycle management, including versioning, testing, and monitoring
Develop and maintain CI/CD pipelines for machine learning workflows
Optimize model inference performance and reduce latency/cost across production systems
Collaborate with ML Engineers and Data Scientists to productionize models efficiently
Implement robust monitoring, alerting, and observability solutions for ML systems
Drive technical decisions on ML Ops tooling, infrastructure, and architecture
Ensure high availability and reliability of ML services at scale
Manage project timelines, priorities, and resource allocation for the ML Ops team
Requirements
Bachelor's or Master's degree in Computer Science or related field with 5+ years of professional experience in ML Engineering or MLOps
2+ years of experience managing or leading engineering teams
Deep expertise in ML model deployment, serving infrastructure, and production ML systems
Hands-on experience with transformer architectures (e.g., BERT, ViT) for natural language and vision tasks.
Strong understanding of multimodal embedding techniques for integrating text, image, audio, and structured data.
Experience with LLM models such as Gemini, GPT, Claude, Qwen, etc.
Experience with ML experiment tracking, model versioning, and feature stores
Strong understanding of CI/CD principles applied to ML workflows
Experience optimizing model inference performance (ONNX, TensorRT, or similar)
Excellent leadership, communication, and stakeholder management skills
Track record of building and scaling high-performing engineering teams
Openness to new technologies and creative solutions
Benefits
Flexible PTO
Medical, dental, and vision insurance with FSA options
Company-paid life insurance
Paid parental leave
401(k) with company match
Professional development opportunities
14 paid holidays off
Flexible hybrid work schedule
"Summer Fridays" (shorter work days on select Fridays during the summertime)
In-office lunches and lots of free food
Optional in-person and virtual events (we like to celebrate!)
Machine Learning Engineer focusing on MLOps and software engineering at flaschenpost, ensuring robust planning and operational success through ML products.
AI ML Engineer at global networking leader, shaping ML strategy and building high - performance systems. Innovating with AI technology to enhance network management and develop flagship products.
Staff Machine Learning Engineer developing the next generation of AI Agent OS and SDKs for GEICO. Key responsibilities include architecting scalable systems and implementing observability frameworks.
Senior Staff Machine Learning Engineer leading technical architecture for GEICO's AI Agent Platform. Driving innovation and enhancing productivity for internal associates and customers.
Senior Machine Learning Engineer at Bumble developing scalable AI systems for personalized user interactions. Leading machine learning model development and deployment from exploration to production.
Lead Machine Learning Engineer at Bumble shaping user connections through machine learning. Driving end - to - end AI solutions while mentoring engineers in a hybrid work environment.
Designing and operating cloud - based MLOps capabilities supporting analytical and generative AI models. Collaborating with data science and business teams for high - impact AI solutions.
Machine Learning Engineer analyzing data structures and developing ML models for customer profiling in Azerbaijan. Collaborating on probabilistic modeling and data quality improvement.
Machine Learning Engineer at HackerRank working on integrity systems to improve model quality. Collaborating on strategies for new signals like audio analysis and behavioral anomalies.
Machine Learning Engineer developing integrity systems for assessing model quality at HackerRank. Collaborating on multimodal signal processing and improving model performance.