Hybrid Senior Associate Cloud SRE

Posted 2 weeks ago

Apply now

About the role

  • Site Reliability Engineer delivering AWS managed services support with a focus on system stability and automation. Collaborating with teams to resolve complex operational challenges in a hybrid role.

Responsibilities

  • Deliver tier two cloud operations managed services support for AWS environments
  • Provide 24x7x365 tier two support and escalation handling for AWS environments
  • Execute complex operational tasks including: Patching and managing Amazon Machine Images (AMIs)
  • Creating and configuring EC2 instances and RDS databases
  • Managing IAM roles, users, and policies
  • Configuring S3 bucket policies and Access Control Lists (ACLs)
  • Opening and managing network routes
  • Restoring snapshots and database backups to lower environments
  • Increasing disk sizes and managing storage optimization
  • Implementing proper tagging for environment identification and cost allocation
  • Managing logs archiving and retention policies
  • Handle escalations from tier one support with deep technical analysis
  • Provide root cause analysis for complex incidents and recurring issues
  • Implement and maintain Service Level Indicators (SLIs) and Service Level Objectives (SLOs)
  • Lead tier two incident response, performing advanced troubleshooting and resolution
  • Conduct thorough post-incident analysis with actionable remediation plans
  • Reduce reactive work by improving runbooks, alert configurations, and standard operating procedures
  • Apply reliability engineering best practices with oversight and review
  • Mentor tier one engineers during incident response
  • Build and maintain CI/CD pipelines for infrastructure and application deployments
  • Automate complex operational tasks including patching, backups, and environment provisioning
  • Develop infrastructure automation using Terraform or equivalent IaC tools
  • Create sophisticated scripts and tooling to eliminate manual toil and improve operational efficiency
  • Follow established patterns and contribute continuous improvements
  • Document automation processes for knowledge sharing
  • Deploy and operate containerized workloads using Docker on AWS services (ECS, EKS, or other managed container platforms)
  • Support container reliability through proper health checks, autoscaling configurations, and resource management
  • Implement safe deployment patterns (canary deployments, blue/green deployments)
  • Troubleshoot complex containerization and orchestration issues
  • Configure and maintain comprehensive monitoring, logging, and alerting systems
  • Leverage observability data to identify issues and lead root cause analysis
  • Contribute to performance tuning and cost optimization initiatives
  • Ensure proper instrumentation and telemetry across AWS environments
  • Identify patterns and trends to prevent future incidents
  • Build custom dashboards and reports for operational insights
  • Work closely with customer development and operations teams
  • Participate in design reviews and reliability assessments
  • Communicate technical concepts, tradeoffs, and recommendations clearly to stakeholders
  • Provide regular operational updates and service reports
  • Act as technical liaison between customers and internal engineering teams

Requirements

  • 4 to 8 years of experience in DevOps, SRE, or production operations roles
  • Proven experience operating production systems in AWS environments
  • Demonstrated experience managing containerized applications in production
  • Experience delivering managed services or supporting customer-facing infrastructure
  • Track record of handling complex technical escalations
  • Strong working knowledge of EC2, RDS, S3, IAM, VPC, CloudWatch, and related services
  • Hands-on experience with Docker and container orchestration platforms (ECS, EKS, or managed Kubernetes)
  • Proficiency with Terraform or equivalent tools
  • Experience building and maintaining automated deployment pipelines
  • Proficiency in Python, Go, Bash, or similar languages
  • Experience with observability tools (CloudWatch, Datadog, Splunk, ELK, or similar)
  • Proficiency with Git and collaborative development workflows
  • Advanced diagnostic and problem-solving capabilities
  • Experience with 24x7 operations and tier two escalation support

Benefits

  • Health insurance
  • Paid time off
  • Flexible work arrangements

Job title

Senior Associate Cloud SRE

Job type

Experience level

Senior

Salary

Not specified

Degree requirement

Bachelor's Degree

Location requirements

Report this job

See something inaccurate? Let us know and we'll update the listing.

Report job