Hybrid Senior Cloud Site Reliability Engineer

Posted last month

Apply now

About the role

  • Define, measure, and maintain SLOs and SLIs for cloud services and infrastructure.
  • Lead efforts to improve system availability, fault tolerance, and disaster recovery.
  • Ensure proactive incident detection, root cause analysis, and timely resolution.
  • Participate in a 24x7 on-call rotation.
  • Drive automation to reduce manual intervention in cloud infrastructure management.
  • Implement Infrastructure as Code using Terraform, AWS CloudFormation, and Ansible.
  • Automate deployment, scaling, and monitoring processes.
  • Design and implement monitoring, logging, and alerting solutions using Prometheus, Grafana, CloudWatch, etc.
  • Identify and resolve performance bottlenecks.
  • Build cloud infrastructure with security best practices and collaborate with security teams to implement controls and audits.
  • Partner with development, DevOps, and operations teams to align infrastructure with business needs.
  • Mentor junior engineers and serve as technical point of contact for infrastructure-related issues.
  • Lead incident response, conduct post-incident reviews, and implement preventive measures.
  • Continuously improve incident management and operational processes.

Requirements

  • 5–9 years of hands-on experience with cloud automation and configuration tools (e.g., Terraform, CloudFormation, Ansible) in a hybrid cloud setup.
  • 4+ years in SRE, Infrastructure Engineering, or DevOps roles.
  • Deep expertise in AWS services (e.g., EC2, S3, Lambda) and Kubernetes.
  • Proficiency in scripting/programming (e.g., Python, Go, Bash).
  • Experience with observability tools (e.g., Prometheus, Grafana, Datadog, ELK).
  • Familiarity with CI/CD pipelines and cloud-native development practices.
  • Strong experience managing production environments in AWS, GCP, or Azure.
  • Knowledge of cloud-native architectures, microservices, and containerization (Kubernetes, Docker).
  • Proven ability to build scalable, fault-tolerant systems.
  • Solid understanding of cloud networking, storage, compute, and security best practices.
  • Experience supporting both private and public cloud environments.
  • Bonus: Experience with AI tools such as Windsurf, GitHub Copilot, or similar.

Benefits

  • Health and financial benefits
  • Commuter support
  • Employee assistance programs
  • Tuition assistance
  • Employee resource groups
  • Collaborative workspaces
  • Some offices even welcome dogs
  • Flexibility and work-life balance
  • Company-sponsored events (book clubs, external speakers, hackathons)
  • Perks specific to each location

Job title

Senior Cloud Site Reliability Engineer

Job type

Experience level

Senior

Salary

Not specified

Degree requirement

No Education Requirement

Location requirements

Report this job

See something inaccurate? Let us know and we'll update the listing.

Report job