Architect designing and implementing MLOps strategy for the EVOKE Phase-2 programme at Quantiphi. Leading enterprise-grade ML pipelines and collaborating across teams for production-ready ML solutions.
Responsibilities
Architect and implement the MLOps strategy for the EVOKE Phase-2 programme , ensuring alignment with the project proposal and delivery roadmap.
Design and own enterprise-grade ML/LLM pipelines covering model training, validation, deployment, versioning, monitoring, and CI/CD automation.
Build container-oriented ML platforms (EKS-first) while evaluating alternative orchestration tools with similar capabilities (Kubeflow, SageMaker, MLflow, Airflow, etc.).
Implement hybrid MLOps + LLMOps workflows , including prompt/version governance, evaluation frameworks, and monitoring for LLM-based systems.
Serve as a technical authority across multiple internal and customer projects, not limited to EVOKE, contributing architectural patterns, best practices, and reusable frameworks.
Enable observability, monitoring, drift detection, lineage tracking, and auditability across ML/LLM systems.
Collaborate with cross-functional teams — data engineering, platform, DevOps, and client stakeholders — to deliver production-ready ML solutions.
Ensure all solutions adhere to security, governance, and compliance expectations , particularly around handling cloud services, Kubernetes workloads, and MLOps tools.
Conduct architecture reviews, troubleshoot complex ML system issues, and guide teams through implementation across cloud-native ML platforms.
Mentor engineers and provide guidance on modern MLOps tools, platform capabilities, and best practices.
Requirements
7-14 years of experience in ML/AI engineering or MLOps roles with strong architecture exposure.
Strong expertise in AWS cloud-native ML stack , including: EKS (primary), ECS, Lambda, API Gateway, CI/CD (CodeBuild/CodePipeline or equivalent)
Hands-on experience with at least one major MLOps toolset and awareness of alternatives: MLflow, Kubeflow, SageMaker Pipelines, Airflow, BentoML, KServe, Seldon
Deep understanding of model lifecycle management (training → registry → deployment → monitoring).
Deep understanding of ML lifecycle : data ingestion, feature engineering, training, evaluation, model packaging, CI/CD, drift detection, monitoring, and governance.
Strong experience with AWS SageMaker (Training, Processing, Batch Transform, Pipelines, Feature Store, Model Registry, Model Monitor).
Experience implementing ML CI/CD pipelines including automated training, testing, validation, model promotion, and endpoint deployment.
Ability to build dynamic and versioned pipelines using SageMaker Pipelines, Step Functions, or Kubeflow.
Strong SQL and data transformation experience using Snowflake , Databricks, Spark, or EMR.
Experience with feature engineering pipelines and Feature Store management (SageMaker or Feast).
Understanding of lineage tracking : training data snapshot, feature versions, code versioning, metadata tracking, reproducibility.
Hands-on experience with Bedrock , OpenAI , Anthropic , or Llama models.
Experience with CloudWatch , SageMaker Model Monitor , Prometheus/Grafana , or Datadog.
Strong foundation in Python and cloud-native development patterns.
Solid understanding of security best practices, IAM, secrets management, and artifact governance.
Good to have skills: Experience with vector databases, RAG pipelines, or multi-agent AI systems.
Exposure to DevOps and infrastructure-as-code (Terraform, Helm, CDK).
Hands-on understanding of model drift detection, A/B testing, canary rollouts, and blue-green deployments.
Familiarity with Observability stacks (Prometheus, Grafana, CloudWatch, OpenTelemetry).
Knowledge of Lakehouse (Delta/Iceberg/Hudi) architecture.
Ability to translate business goals into scalable AI/ML platform designs.
Strong communication and cross-team collaboration skills.
Ability to guide engineering teams through technical uncertainty and design choices.
AI & ML Engineer enhancing energy management software solutions at GreenPocket GmbH. Focusing on modern LLM architectures and AI integration for innovative user experience.
Machine Learning Engineer responsible for implementing and maintaining data science models in bpx’s machine learning studio. Bridging data science and computational needs to achieve business outcomes.
Machine Learning Engineer at DentalMonitoring developing AI solutions for orthodontics. Responsibilities include model development, evaluation, deployment, and performance monitoring.
Machine Learning Engineer at Hiscox working on fraud detection and generative AI projects. Collaborating closely with data scientists and engineers to solve complex business challenges.
Internship focusing on programming robotic arms and using machine learning in simulations at Fraunhofer IIS. Opportunity to gain practical experience and contribute to innovative research.
Senior Machine Learning Engineer at Itaú, driving innovation with data and AI solutions. Collaborating across teams to implement robust machine learning architectures and ensure scalable deployments.
Machine Learning Engineer responsible for developing and deploying advanced ML and AI solutions at Zendesk. Collaborating with stakeholders to deliver impactful business outcomes using latest machine learning technologies.
Lead advanced machine learning model development and optimization at PayPal. Collaborate with teams to deploy scalable ML solutions in production environments.
Senior Machine Learning Engineer at Pivotal Health developing ML systems for healthcare reimbursement. Collaborating across teams to build and maintain reliable, production - grade machine learning systems.
Machine Learning Engineer working with Algorithm team on customer onboarding processes. Focus on execution and automation of models using computer vision and AI in sports industry.