Senior Specialist – Cloud SRE at Datavail | Hybrid Hired

About the role

Senior Site Reliability Engineer improving reliability and performance of business-critical services in multi-cloud AWS, Azure, and GCP environments. Collaborate with engineering teams to drive automation and measurable outcomes.

Responsibilities

Reliability Engineering & SRE Practices
Define, implement, and maintain Service Level Indicators (SLIs), Service Level Objectives (SLOs), and error budgets for critical services.
Continuously monitor SLO compliance and drive improvements based on error budget consumption.
Participate in architecture reviews focused on high availability, disaster recovery, scalability, and fault tolerance.
Lead incident response, acting as the Tier-3 escalation point for SRE and operations teams.
Drive blameless postmortems, Root Cause Analysis (RCA), and ensure corrective and preventive actions are implemented.
Define and maintain incident response runbooks, escalation paths, and on-call processes.
Track and improve key reliability metrics including MTTR, incident frequency, and change failure rate.
Automate infrastructure provisioning and operational workflows using Terraform, CloudFormation, and AWS CDK.
Build and maintain CI/CD pipelines supporting canary deployments, blue/green strategies, and automated rollbacks.
Implement event-driven automation and auto-remediation using AWS Lambda, Step Functions, or Azure Functions.
Continually identify and eliminate operational toil through automation and self-healing systems.
Design, implement, and operate end-to-end observability platforms covering metrics, logs, and traces.
Ensure alerts are SLO-driven, actionable, and noise-free.
Provision and manage cloud infrastructure across AWS, Azure, and/or GCP.
Operate compute, storage, networking, load balancers, VPNs, and private connectivity.
Manage patching, backups, encryption, IAM/RBAC, and disaster recovery readiness.
Optimize performance and cost through rightsizing, autoscaling, and capacity planning.

Requirements

8–10 years of experience in SRE, Cloud Engineering, or Production Operations roles.
Strong OS fundamentals: Linux and Windows, with scripting (Bash, PowerShell).
Strong programming skills in Python, Go, or equivalent.
Proven hands-on experience with:
Infrastructure as Code (Terraform, CloudFormation, CDK)
CI/CD pipelines and deployment automation
Observability tools (New Relic, Datadog, Prometheus, Grafana, Graylog, ELK)
Distributed systems at production scale
Cloud certifications (one or more):
AWS (Associate or Professional)
Azure (AZ-104 / Architect Expert)
GCP (Professional Cloud Architect)
Cloud-agnostic certification such as Terraform Associate, CKA, or SRE Foundation.

Similar roles

Browse all Devops Engineer jobs

1 hour ago

SA

Maintenance Mechanical/Reliability Engineer

SABIC

Mechanical/Reliability Engineer responsible for mechanical installations in Bergen op Zoom. Analyzing maintenance strategies and leading projects to enhance reliability.

Onsite Role

Bergen op Zoom Netherlands Devops Engineer

yesterday

VE

Senior DevOps Engineer

Verizon

Senior DevOps Engineer responsible for cloud infrastructure and deployments. Optimizing AWS services and ensuring system security and reliability for Verizon.

Hybrid Role

Irving United States Devops Engineer

$120,500 - $231,000 per year

yesterday

JO

Senior DevOps Engineer

Jobs2web

Senior DevOps Engineer responsible for automating infrastructure and building CI/CD pipelines for collaborative robotics company. Collaborating with global engineering teams from the Bangalore office.

Onsite Role

Bangalore India Devops Engineer

2 days ago

TE

Site Reliability Engineer Intern

Tencent

Site Reliability Engineer Intern at Tencent working on gaming services and cloud native solutions. Collaborating with global teams to eliminate toil and enhance reliability.

Hybrid Role

Los Angeles United States Devops Engineer

$27 - $52 per hour

2 days ago

N5

Cloud/DevOps Specialist – Pre-Trade Squad

N5X

Cloud/DevOps Specialist at N5X managing and optimizing critical cloud infrastructures for Brazilian energy trading. Collaborating with a multidisciplinary team to ensure high availability and performance.

Hybrid Role

São Paulo Brazil Devops Engineer

2 days ago

N5

Cloud/DevOps Specialist – Trade Squad

N5X

Cloud/Devops Specialist responsible for designing a hybrid architecture combining cloud and on - premises infrastructure for energy trading systems. Collaborating with a multidisciplinary team in a dynamic environment.

Hybrid Role

São Paulo Brazil Devops Engineer

2 days ago

EN

Reliability Engineering Specialist

Enbridge

Reliability Engineering Specialist utilizing reliability tools and models to improve asset performance at Enbridge. Collaborating across teams to guide investment decisions for safe operations.

Hybrid Role

Edmonton Canada Devops Engineer

2 days ago

MT

Senior DevOps Specialist

Magnum Tires

DevOps Engineer responsible for structuring and supporting cloud DevOps architecture in Brazil. Working strategically on automation and CI/CD practices with development teams in Pernambuco.

Hybrid Role

Recife Brazil Devops Engineer

2 days ago

BO

DevSecOps Software Engineer – Experienced/Senior

Boeing

DevSecOps Software Engineer developing secure CI/CD pipelines for Boeing's military software systems. Collaborate with cross - functional teams and implement automation and security best practices.

Onsite Role

Hazelwood United States Devops Engineer

$112,200 - $185,150 per year

3 days ago

LE

DevOps Manager – USAF Cloud One

Leidos

DevOps Manager responsible for managing a team for multi - cloud solutions supporting the USAF Cloud One project. Focus on scalable cloud - native solutions and CI/CD practices.

Hybrid Role

United States Devops Engineer

$131,300 - $237,350 per year