Hybrid Senior AL/ML Ops Engineer

Posted 4 hours ago

Apply now

About the role

  • Senior AI/ML Ops Engineer at Smartsheet responsible for building scalable AI/ML platforms. Collaborating with cross-functional teams to enhance data infrastructure and operational efficiency.

Responsibilities

  • Designing, Developing and overseeing the strategy and architecture of scalable and reliable AI/ML Ops platforms / pipelines
  • Model Deployment: Package and deploy AI/ML services to production, ensuring they are reproducible and interpretable
  • CI/CD Pipeline Development: Design and implement automated CI/CD (Continuous Integration/Continuous Deployment) pipelines to accelerate model deployment using tools
  • Infrastructure Management: Provision and optimize infrastructure for training and serving, utilizing Docker, Kubernetes, or serverless platforms
  • Monitoring & Observability : Implement post-deployment monitoring for model performance, data drift, and latency using tools. Experience in Monte Carlo is preferable
  • Automation: Automate retraining and data pipeline workflows to ensure models stay accurate over time.
  • Manage the deployment of foundation models, fine-tuning workflows, and Retrieval-Augmented Generation (RAG) stacks (Vector DBs, Knowledge Graph. Experience with AWS Bedrock is preferable
  • Resource Optimization: Manage GPU/CPU utilization to minimize cloud costs while maintaining low-latency inference for users
  • Collaboration: Work closely with data scientists, data engineers, and software engineers to bridge the gap between model development and production.
  • Version Control & Governance: Manage versioning for data, code, and models using tools like MLflow.
  • Security & Compliance: Implementing data security measures, ensuring compliance with data governance policies, and protecting sensitive data
  • Technology Evaluation and Innovation: Staying abreast of emerging data technologies and exploring opportunities for innovation to improve the organisation’s data infrastructure
  • Troubleshooting and Problem Solving: Diagnosing and resolving complex data-related issues, ensuring the stability and reliability of the data platform
  • Perform other duties as assigned

Requirements

  • Enterprise SaaS software solutions with high availability and scalability
  • Solution handling large scale structured and unstructured data from varied data sources
  • Experience in building and maintaining AI/ML Ops platform systems ensuring scalability, reliability, efficiency and security
  • Working with Product engineering team to influence designs with data, AI and analytics use cases in mind
  • In depth experience in System design, AI/ML Frameworks and tools involving large Petabytes of data with Databricks Lakehouse ecosystem
  • AI/MLOps workflows on Databricks , MLFlow, Mosaic AI Agent Framework, Unity Catalog, Vector Search, Knowledge Graph
  • Knowledge of AI/ML frameworks like LangChain, LangGraph for AI/ML Ops pipeline integration
  • Cloud Platforms: Hands-on experience with at least one major cloud provider (AWS, Azure, or GCP). Experience in AWS hosted data platform is preferable
  • Programming languages like Python and SQL
  • Modern software engineering practices like Kubernetes, CI/CD, IAC tools (Preferably Terraform), Observability, monitoring and alerting
  • Solution Cost Optimisations and design to cost
  • Legally eligible to work in India on an ongoing basis.

Benefits

  • Health insurance
  • Flexible working hours
  • Professional development opportunities

Job title

Senior AL/ML Ops Engineer

Job type

Experience level

Senior

Salary

Not specified

Degree requirement

No Education Requirement

Location requirements

Report this job

See something inaccurate? Let us know and we'll update the listing.

Report job