AWS Cloud Engineer optimizing AWS environments for data science workloads. Collaborating with data and ML teams on cloud architecture and ML infrastructure.
Responsibilities
Manage, scale, and optimize cloud environments used for data science workloads (primarily AWS, Databricks, dbt).
Provision, maintain, and optimize compute clusters for ML workloads (e.g., Kubernetes, ECS/EKS, Databricks, SageMaker).
Implement and maintain high-availability solutions for mission-critical analytics platforms.
Deep expertise in AWS resource management and provisioning, including IAM roles and permissions.
Develop CI/CD pipelines for model deployment, infrastructure-as-code (IaC), and automated testing using industry standard toolchains.
Build monitoring, alerting, and logging systems for cloud and ML infrastructure (e.g., Datadog, CloudWatch, Prometheus, Grafana, ELK).
Automate provisioning, configuration, and deployments using tools such as Terraform and CloudFormation, GitHub actions, etc.
Implement and operationalize data science security and compliance controls for data science platforms in alignment with enterprise cloud standards.
Conduct periodic risk assessments, best practice reviews, and remediation efforts to strengthen security and resiliency.
Support secure handling of sensitive financial data.
Partner with data scientists, machine learning engineers, and data engineers to deeply understand and support their needs and workflows within data-driven initiatives.
Serve as a technical advisor on cloud architecture, performance optimization, and production readiness for data and ML platforms.
Adopt and champion Agile, DevOps, and Platform Engineering practices (kanban, scrum, continuous improvement, automation, Everything-as-a-Service).
Demonstrate a strong, proactive focus on serving internal customers, prioritizing user experience, identifying opportunities to leverage automation and self-service to reduce toil and cognitive load for developers and researchers.
Requirements
A bachelor’s degree or higher in a STEM field, required
5+ years of experience in cloud operations, DevOps, platform engineering, SRE, sysadmin or related roles.
Strong proficiency with at least one major cloud provider (AWS preferred).
Hands-on experience with IaC tools (Terraform, CloudFormation, or similar).
Strong scripting skills (Python, Bash, or PowerShell).
Strong understanding of modern authentication and authorization technologies and secrets management (IAM, OIDC, OAuth2, RBAC, ABAC, privileged access management, JIT authorization, PKI).
Experience with common CI/CD systems (GitHub Actions, Jenkins, GitLab CI, ArgoCD,, or similar).
Familiarity with container orchestration (Docker Compose, EKS/Kubernetes, ECS).
Experience supporting data-intensive or ML workloads.
Experience in financial services, investment management, or other highly regulated industries preferred.
Knowledge of ML/AI platform tools (Databricks, SageMaker, MLflow, Airflow) preferred.
Hands-on experience with AI Engineering and LLMOps tools (LLM observability, eval pipelines, building/supporting agentic workflows) are a huge plus.
Understanding of networking, VPC architectures, and cloud security best practices preferred.
Familiarity with distributed compute frameworks (Spark, Ray, Dask) preferred.
Benefits
EXL never requires or asks for fees/payments or credit card or bank details during any phase of the recruitment or hiring process
EXL will only extend a job offer after a candidate has gone through a formal interview process with members of EXL’s Human Resources team, as well as our hiring managers.
Junior Azure Administrator anchoring day - to - day Azure operations at DOCOsoft, facilitating architecture and governance in a collaborative team environment.
Azure Engineer designing and building secure Azure environments for various clients. Implementing Azure architecture and collaborating with teams to deliver cloud solutions.
Senior Hybrid Cloud Platform Engineer at Intellectix focusing on security and optimization of IT infrastructure through innovative solutions in a hybrid role.
Senior Hybrid Cloud Platform Engineer at Intellectix ensuring the security of crucial data and optimizing IT infrastructure through cutting - edge software systems.
Senior Software Engineer leading software development for Marathon Petroleum. Focusing on Azure technologies and driving innovation in a collaborative team environment.
Senior Cloud Engineer implementing enterprise cloud solutions for Core Specialty's cloud engineering team. Focusing on Microsoft Azure and Infrastructure as Code in a hybrid work setting.
AWS Cloud Engineer joining a cloud transformation project for UK Biobank. Developing and maintaining AWS platforms for world - leading biomedical research.
Service Engineer focusing on cloud infrastructure and network operations in a diverse and inclusive environment. Committed to innovation and operational excellence within NTT DATA.
Senior IT Cloud Engineer deploying and maintaining cloud services in Azure for financial solutions company. Responsibilities include implementing security best practices and ensuring system reliability.
Senior Cloud Architect managing AWS services and infrastructure for fintech projects. Focus on Vault, Terraform, and Kubernetes with a diverse global team.