Senior Site Reliability Engineer improving reliability and performance of business-critical services in multi-cloud AWS, Azure, and GCP environments. Collaborate with engineering teams to drive automation and measurable outcomes.
Responsibilities
Reliability Engineering & SRE Practices
Define, implement, and maintain Service Level Indicators (SLIs), Service Level Objectives (SLOs), and error budgets for critical services.
Continuously monitor SLO compliance and drive improvements based on error budget consumption.
Participate in architecture reviews focused on high availability, disaster recovery, scalability, and fault tolerance.
Lead incident response, acting as the Tier-3 escalation point for SRE and operations teams.
Drive blameless postmortems, Root Cause Analysis (RCA), and ensure corrective and preventive actions are implemented.
Define and maintain incident response runbooks, escalation paths, and on-call processes.
Track and improve key reliability metrics including MTTR, incident frequency, and change failure rate.
Automate infrastructure provisioning and operational workflows using Terraform, CloudFormation, and AWS CDK.
Build and maintain CI/CD pipelines supporting canary deployments, blue/green strategies, and automated rollbacks.
Implement event-driven automation and auto-remediation using AWS Lambda, Step Functions, or Azure Functions.
Continually identify and eliminate operational toil through automation and self-healing systems.
Design, implement, and operate end-to-end observability platforms covering metrics, logs, and traces.
Ensure alerts are SLO-driven, actionable, and noise-free.
Provision and manage cloud infrastructure across AWS, Azure, and/or GCP.
Operate compute, storage, networking, load balancers, VPNs, and private connectivity.
Manage patching, backups, encryption, IAM/RBAC, and disaster recovery readiness.
Optimize performance and cost through rightsizing, autoscaling, and capacity planning.
Requirements
8–10 years of experience in SRE, Cloud Engineering, or Production Operations roles.
Strong OS fundamentals: Linux and Windows, with scripting (Bash, PowerShell).
Strong programming skills in Python, Go, or equivalent.
Proven hands-on experience with:
Infrastructure as Code (Terraform, CloudFormation, CDK)
CI/CD pipelines and deployment automation
Observability tools (New Relic, Datadog, Prometheus, Grafana, Graylog, ELK)
Distributed systems at production scale
Cloud certifications (one or more):
AWS (Associate or Professional)
Azure (AZ-104 / Architect Expert)
GCP (Professional Cloud Architect)
Cloud-agnostic certification such as Terraform Associate, CKA, or SRE Foundation.
Cloud DevOps Specialist responsible for building scalable infrastructure solutions in AWS at SONDA. Focusing on automation, containerization, and data management in a collaborative environment.
DevOps Engineer maintaining and evolving deployment pipelines for Docebo’s AI - powered learning platform. Collaborating with cross - functional teams to ensure efficient software releases and infrastructure management.
DevOps Engineer optimizing CI/CD pipelines for Docebo, an AI - powered learning platform. Involves managing multi - tenant infrastructure using AWS, Docker, and Kubernetes.
DevOps Engineer maintaining and automating infrastructure and CI/CD processes for cybersecurity solutions by NordLayer. Collaborating with teams to ensure performance and scalability of cloud services.
DevOps Engineer maintaining and improving infrastructure and CI/CD processes for cybersecurity solutions provider. Collaborating with cross - functional teams for reliable and scalable cloud solutions.
DevOps Engineer maintaining and automating infrastructure and CI/CD processes at NordLayer. Collaborating with Senior Engineers to implement best practices in a dynamic cybersecurity environment.
Secure DevOps Engineer responsible for integrating security into CI/CD pipelines and strengthening AWS infrastructure. Key expertise in AWS security and container management.
DevOps Engineer responsible for CI/CD pipeline development and automation for urban software solutions. Collaborating with teams to enhance efficiency in software deployment and infrastructure.
DevOps Engineer managing cloud and on - premise platforms for a public sector infrastructure project. Collaboration primarily remote, with occasional on - site meetings.
DevSecOps Engineer architecting CI/CD framework services for Truist, enhancing the flow of business value through DevSecOps practices. Building and maintaining automation for software delivery and operations.