AI Support Engineer ensuring rapid triage, root cause analysis, and resolution for production AI incidents. Monitoring system health and collaborating with engineers to implement observability best practices.
Responsibilities
Serve as the first line of defense for production AI incidents, ensuring rapid triage, root cause analysis, and resolution.
Monitor system health and performance of deployed AI applications, agentic and RAG-based solutions, MCPs, and orchestration platforms.
Track and investigate issues related to latency, failures, model drift, hallucination, prompt misbehavior, or broken integrations, escalating to the AI engineering group where appropriate.
Collaborate with AI and platform engineers to implement observability, logging, and alerting best practices for all AI services.
Build diagnostic tools, runbooks, and automated workflows to improve incident response time and reduce manual intervention.
Maintain knowledge bases and playbooks for repeatable troubleshooting and knowledge transfer.
Partner with governance and compliance teams to ensure incidents are documented and remediated in line with internal policy.
Contribute to postmortems and continuous improvement efforts to harden production systems.
Requirements
4+ years of experience in production support, software engineering, site reliability engineering (SRE), or DevOps—preferably supporting GenAI and/or ML systems.
Strong understanding of cloud infrastructure (AWS, GCP) and AI observability tools (e.g., Fiddler AI, Arize AI, IBM WatsonX.governance, etc.).
Experience with LLM and GenAI systems (OpenAI, Azure OpenAI, Bedrock, Vertex AI, or similar).
Familiarity with modern orchestration and agentic frameworks such as LangChain, LangGraph, Autogen, or CrewAI.
Proficiency in Python or shell scripting for automation and troubleshooting.
Strong analytical, communication, and incident management skills.
Bachelor’s degree in Computer Science, Engineering, or a related field.
1+ years of experience in AI/ML engineering, with a focus on Generative AI.
Proficiency in programming languages such as Python
Strong understanding of Generative AI models (e.g., GPT, Transformer architectures) and experience in distilling, tuning and training them.
Familiarity with Retrieval Augmented Generation (RAG) techniques and their implementation.
Experience with agentic AI concepts and developing autonomous AI workflows.
Hands-on experience with GCP Vertex AI, AWS Bedrock + Sagemaker, and Snowflake Cortex platforms and their AI/ML capabilities.
Experience building production-grade AI/ML systems at scale.
Knowledge of MLOps practices, including model deployment and lifecycle management.
Manager, Technical Support leading technical support operations at an AED technology company. Focused on troubleshooting and improving customer experience across hardware and software platform.
Technical Support Engineer resolving complex issues in real - time for connected hardware and software platform. Collaborating with customers and internal teams to improve system reliability and documentation.
Ingénieur Support Technique spécialisé dans les moteurs diesel chez Liebherr. Responsable de l'assistance technique et de l'amélioration continue des produits avec des déplacements chez les fournisseurs.
Senior Support Engineer providing complex 3rd Level Support at Delegate Technology for software solutions. Analyzing and resolving technical issues while working closely with Core Products Team.
Digital Workplace Support Engineer providing first - line IT support in Brussels for an international client. Supporting end users and managing incidents in a digital workplace.
Trainer developing and maintaining HVAC training programs for Granite Group Wholesalers. Responsible for facilitating training and evaluating effectiveness of educational materials.
HVAC Trainer & Technical Support at Granite Group Wholesalers developing and implementing training for HVAC products. Facilitating learning delivery and maintaining educational materials for internal teams and customers.
Trainer & Technical Support managing educational materials and training for HVAC products at Granite Group Wholesalers LLC. Collaborating with internal teams and providing technical support to customers and employees.
Technical Support Advocate providing hands - on support for Boldr's platform issues. Troubleshooting and resolving technical challenges while ensuring customer satisfaction.