Senior SRE supporting AWS and Azure platform modernization and driving reliability initiatives in Mumbai. Focus on Kubernetes migration, IaC, and observability practices for enhanced service delivery.
Responsibilities
Support Customer AWS/ Azure platform modernization and reliability initiatives.
Migrate legacy worker processes to Kubernetes.
Strengthen Infrastructure as Code (IaC) and CI/CD pipelines.
Drive strong observability and operational excellence.
Embed reliability, automation, and monitoring into the platform.
Ensure high availability, scalability, and predictable deployments.
Define, implement, and maintain Service Level Indicators (SLIs), Service Level Objectives (SLOs), and error budgets for critical services.
Requirements
8–10 years of experience in SRE, DevOps, or Cloud Engineering roles.
Strong Linux fundamentals with scripting (Bash, Python, or equivalent).
Hands-on experience with:
o Kubernetes & containerized workloads
o Terraform / CloudFormation / AWS CDK
o CI/CD pipelines and deployment automation
o Observability tools: New Relic, Datadog, Prometheus, Grafana, ELK/Graylog
Strong understanding of distributed systems and production operations.
AWS DevOps Engineer responsible for delivering scalable digital experiences for EXL's MarTech ecosystem. Engaging in development, maintenance, and collaboration across stakeholders and services.
Senior Site Reliability Engineer managing critical infrastructure at Hornetsecurity. Collaborating with product teams to ensure performance and reliability across services.
Site Reliability Engineer enhancing platform reliability for AI workflows at WRITER. Overseeing automated solutions and cloud infrastructure supporting high - trafficked AI systems.
Site reliability engineer ensuring 24/7 availability of AI - powered workflows at WRITER. Developing and automating robust platforms for high - traffic AI demands.
Site Reliability Engineer maintaining cloud infrastructure for Tricentis SaaS Products. Collaborating closely with engineers, focusing on observability and performance.
Dev Ops Engineer at DATAGROUP in Rostock managing IT applications and cloud technologies. Collaborating with teams to support client IT transformations in a flexible work environment.
SRE Technical Manager leading reliability engineering teams ensuring performance for Navy IT services. Manage teams, collaborate on automation, and drive continuous improvement in a critical systems environment.