Manager leading a team of DevOps engineers and shaping cloud infrastructure strategy at a technology company in India.
Responsibilities
Lead, manage, and grow a team of DevOps engineers (FTEs and contractors), overseeing day-to-day delivery, performance reviews, and career development.
Establish clear ownership, accountability, and a high-performance culture within the DevOps function.
Drive training and up-skilling initiatives across key areas such as Kubernetes, Terraform, and GCP to keep the team current and effective.
Mentor senior engineers and support their growth into technical leadership roles.
Own and evolve the organization’s cloud infrastructure strategy across AWS and GCP, ensuring platforms are scalable, secure, and cost-effective.
Oversee and architect large-scale migrations and infrastructure modernization programs, including cloud platform transitions and GitHub Enterprise adoption.
Set strategic priorities and roadmaps for reliability, automation, observability, and infrastructure improvements aligned with business objectives.
Collaborate with engineering and product leadership to define infrastructure requirements for new platforms and product initiatives.
Establish and lead a dedicated SRE function within the DevOps team, driving ownership of uptime, incidents, and on-call practices.
Oversee the full incident management lifecycle, including on-call processes, RCA sign-off, corrective actions, and preventive measures to improve MTTR.
Define and enforce SLOs, SLIs, and error budgets to maintain high service availability.
Standardize DevOps workflows and tooling across planning, alerting, and incident management platforms.
Define and govern CI/CD standards and pipeline architecture across the organization, ensuring reliable and consistent deployments.
Champion the use of AI-assisted development tools and automation to reduce toil and accelerate delivery velocity.
Oversee container orchestration strategy using Kubernetes (EKS, OpenShift) and ensure best practices for containerized workloads.
Drive Infrastructure as Code (IaC) adoption using Terraform and Ansible to maintain consistent, auditable environments.
Own the organization’s observability strategy, driving adoption of monitoring, logging, and alerting solutions across all platforms.
Lead technology audit and compliance programs aligned to ISO certification standards.
Partner with security teams to embed DevSecOps practices into pipelines and infrastructure provisioning.
Work closely with leadership to communicate risks, trade-offs, and timelines in a clear, actionable manner.
Requirements
10+ years of hands-on experience in DevOps, platform engineering, or site reliability engineering, with at least 2 years in a people management role.
Proven experience managing cross-functional teams including full-time engineers and contractors.
Deep expertise in AWS (20+ services) and hands-on experience with GCP; familiarity with Azure or other cloud platforms is advantageous.
Strong proficiency in container orchestration using Kubernetes (AWS EKS, GKE) and Docker.
Hands-on expertise with Infrastructure as Code tools, particularly Terraform and Ansible.
Demonstrated experience designing and managing CI/CD pipelines using tools such as Jenkins, ArgoCD, GitHub Actions, or GitLab CI.
Experience establishing and running SRE functions, including on-call frameworks, incident management, and RCA processes.
Proficiency in observability tooling including Grafana Stack (Grafana, Loki, Mimir), ELK/OpenSearch, and AWS CloudWatch.
Strong scripting and automation skills in Python and Shell.
Experience leading or contributing to technology audits and compliance initiatives (e.g., ISO certifications).
Excellent communication skills with the ability to explain technical concepts and risks to non-technical stakeholders and senior leadership.
Experience with project and service management tooling such as JIRA, PagerDuty or equivalent platforms.
As Learning Content Engineer, developing and enhancing training content for Cloud and DevOps. Engaging in creating practical learning materials from basics to advanced topics.
AWS DevOps Microservices Engineer at Solventum designing secure and scalable AWS infrastructures. Collaborating with diverse teams for innovative healthcare solutions using cloud technology.
DevOps Engineer building and maintaining Catena’s scalable platform infrastructure. Collaborating with engineers to enhance CI/CD pipelines and support cloud - native workloads on AWS.
Platform System Reliability Engineer focused on operations of EKS Kubernetes environment for GE Vernova's SaaS grid products. Responsible for the full lifecycle of production clusters from performance tuning to securing infrastructure.
SRE Observability SLO Engineer for GE Vernova’s GridOS Platform Engineering team. Building telemetry stack in SaaS reliability for critical energy infrastructure.
DevOps Engineer responsible for building and operating automation services using Ansible for Rabobank. Collaborating with teams to ensure stable, secure, and auditable infrastructure across multiple servers.
Engineer collaborating with AI startups to enhance their systems and contribute to OpenAI's products. Engaging in technical problem - solving and building relationships within the startup ecosystem.
Senior Software Engineer designing and developing software applications for space technologies. Leading technical decisions and collaborating on innovative solutions to enhance national security.
DevOps Engineer responsible for web application operations and developer experience at Nitrado, a global game server hosting provider. Collaborating with developers on automation, Kubernetes, and Docker management.