MLOps Support Engineer in charge of operational support for AI/ML solutions ensuring system stability in production. Responsible for monitoring and incident management of ML models and pipelines.
Responsibilities
Provide Tier 1 / Tier 2 operational support for AI/ML solutions.
Identify failed jobs, degraded pipelines, or performance anomalies.
Triage incidents, investigate issues, and coordinate escalation to Tier 3 Engineering.
Participate in on-call rotas once established.
Validate that pipelines and jobs complete successfully.
Monitor data pipeline health, model execution, and basic performance metrics.
Identify operational issues before they impact customers
Respond or alert customers when there has been an outage or issue with one of their models.
Support incident management, rollback, and recovery activities.
Use and maintain runbooks and operational documentation.
Work with Engineering to improve supportability and observability.
Contribute to knowledge sharing to reduce single points of failure.
Work within defined SLAs and support processes as the service matures
Build quarterly business reviews to provide updates on the health of the ML Models.
Evaluate champion/challenger models to see if a new model should be promoted.
Monitor for model drift and performance degradation, while validating that updates (new champion models or added data) do not introduce bias.
Requirements
Experience in operations, DevOps, SRE, or platform support roles.
Strong troubleshooting skills in production environments.
Proficiency in SQL and scripting (Python, Bash) for developing and automating ML workflows.
Familiarity with Cloud-hosted systems (AWS, GCP, Azure) for cloud-based ML services.
Git: Solid understanding of version control, particularly in collaborative development environments.
Comfortable working from runbooks and structured processes.
Exposure to AI/ML systems in production.
Familiarity with monitoring and observability tools (Grafana, PowerBI, New Relic).
Knowledge of MLOps tooling and data platforms (ML FLow, Databricks)
Experience supporting customer-facing platforms.
Knowledge of containerization (Kubernetes) is a plus.
Experience of LLM Prompt Engineering and troubleshooting
Early career in MLOps or ML Engineering.
Someone who is eager to learn about complex predictive models.
Background in computer science, informatics, or related fields
Passion for Machine Learning and AI: An eager learner who is excited about working with cutting-edge ML technologies and is passionate about optimizing and maintaining ML models in production environments.
Early Career in MLOps or ML Engineering: Ideally, Junior ML Engineer with a strong desire to grow in the field of MLOps and AI operations.
A Collaborative Mindset: You thrive in a team setting and are ready to contribute to model improvement, A/B testing, and iterative development.
Attention to Detail: A focus on model performance, bias prevention, and ensuring optimal model behavior as new data and models are introduced.
Technical Support Specialist providing troubleshooting and support for Kenwood Land Mobile products. Engaging with dealers and customers by providing technical assistance and training as needed.
Helpdesk Support Technician providing technical support and troubleshooting for desktop equipment in a healthcare company. Involves software and hardware issue resolution and deployment of desktop systems.
Support Analyst providing technical customer assistance via phone, email, and chat. Identifying issues and ensuring satisfaction while documenting interactions.
Associate Technical Support Engineer involved in trade life cycle and investment operations at EXL. Responsibilities include reconciliation and coordination of break investigations for public and private assets.
Communications Engineer providing primary in - country support for Kuwait military communications equipment. Engaging in technical assistance, monitoring, reporting, and maintenance for Patriot communications systems.
Technical Support Manager at CallRail leading a customer - facing support team in resolving client inquiries. Overseeing technical support and enhancing overall customer experience through effective team management.
Technical Support Engineer resolving customer issues for Orb's usage - based billing platform. Collaborating with Customer Success, Product, and Engineering teams to enhance customer support experience.
Lead subject matter expert efforts on Teradyne platforms and support Product Development process in Seoul, KR. Analyze trends for system reliability and provide customer support.
Senior Support Engineer providing technical support to dental practices at Overjet. Handling inquiries and troubleshooting issues while assisting with customer success requests.
Technical Support Engineer providing 2nd level support for SCADA systems in a hybrid model. Collaborating with service technicians globally and assisting in product release processes.