Senior AI/ML Ops Engineer at Smartsheet responsible for building scalable AI/ML platforms. Collaborating with cross-functional teams to enhance data infrastructure and operational efficiency.
Responsibilities
Designing, Developing and overseeing the strategy and architecture of scalable and reliable AI/ML Ops platforms / pipelines
Model Deployment: Package and deploy AI/ML services to production, ensuring they are reproducible and interpretable
CI/CD Pipeline Development: Design and implement automated CI/CD (Continuous Integration/Continuous Deployment) pipelines to accelerate model deployment using tools
Infrastructure Management: Provision and optimize infrastructure for training and serving, utilizing Docker, Kubernetes, or serverless platforms
Monitoring & Observability : Implement post-deployment monitoring for model performance, data drift, and latency using tools. Experience in Monte Carlo is preferable
Automation: Automate retraining and data pipeline workflows to ensure models stay accurate over time.
Manage the deployment of foundation models, fine-tuning workflows, and Retrieval-Augmented Generation (RAG) stacks (Vector DBs, Knowledge Graph. Experience with AWS Bedrock is preferable
Resource Optimization: Manage GPU/CPU utilization to minimize cloud costs while maintaining low-latency inference for users
Collaboration: Work closely with data scientists, data engineers, and software engineers to bridge the gap between model development and production.
Version Control & Governance: Manage versioning for data, code, and models using tools like MLflow.
Security & Compliance: Implementing data security measures, ensuring compliance with data governance policies, and protecting sensitive data
Technology Evaluation and Innovation: Staying abreast of emerging data technologies and exploring opportunities for innovation to improve the organisation’s data infrastructure
Troubleshooting and Problem Solving: Diagnosing and resolving complex data-related issues, ensuring the stability and reliability of the data platform
Perform other duties as assigned
Requirements
Enterprise SaaS software solutions with high availability and scalability
Solution handling large scale structured and unstructured data from varied data sources
Experience in building and maintaining AI/ML Ops platform systems ensuring scalability, reliability, efficiency and security
Working with Product engineering team to influence designs with data, AI and analytics use cases in mind
In depth experience in System design, AI/ML Frameworks and tools involving large Petabytes of data with Databricks Lakehouse ecosystem
AI/MLOps workflows on Databricks , MLFlow, Mosaic AI Agent Framework, Unity Catalog, Vector Search, Knowledge Graph
Knowledge of AI/ML frameworks like LangChain, LangGraph for AI/ML Ops pipeline integration
Cloud Platforms: Hands-on experience with at least one major cloud provider (AWS, Azure, or GCP). Experience in AWS hosted data platform is preferable
Programming languages like Python and SQL
Modern software engineering practices like Kubernetes, CI/CD, IAC tools (Preferably Terraform), Observability, monitoring and alerting
Solution Cost Optimisations and design to cost
Legally eligible to work in India on an ongoing basis.
Machine Learning Engineer developing advanced ML - driven applications to enhance quantum technologies. Collaborating with teams to translate complex physical data into actionable improvements.
Lead Machine Learning Engineer at Disney applying AI and machine learning to enhance advertising capabilities. Collaborating with teams to build robust ML systems and drive innovation.
Senior Machine Learning Scientist improving customer and business outcomes using ML and statistical modeling. Working with experienced team and involved in end - to - end model development.
Machine Learning Engineer developing LLM - powered systems at Trainline. Designing predictive ML systems, collaborating with cross - functional teams on AI initiatives.
Staff ML Engineer building scalable platforms for ML model training and evaluation at GM. Collaborating on autonomous driving technology development and mentoring junior engineers.
Machine Learning Software Engineer developing and industrialising AI solutions for Tech Soft 3D's HOOPS AI product. Collaborating on core libraries and APIs for industrial 3D applications.
AI and ML Engineer deploying machine learning solutions for national security. Collaborating with engineers and scientists to deliver data processing solutions at scale.
Working Student in Signal Processing and Machine Learning at Fraunhofer Institute for Integrated Circuits. Involved in research and application - oriented projects with flexible hours and learning opportunities.
AI/ML Engineer building intelligent systems using machine learning and AI at Emumba. Developing, training, and deploying ML models while collaborating with cross - functional teams.