Hybrid Site Reliability Engineer II

Posted 3 hours ago

Apply now

About the role

  • Site Reliability Engineer II enhancing operational excellence within Cox Automotive's SRE team. Focused on improving reliability and observability across multiple teams using advanced technologies.

Responsibilities

  • Define and drive adoption of SLIs, SLOs, error budgets, and high-quality alerting standards across the organization
  • Architect end-to-end observability strategies (metrics, logs, traces, business signals) with consistent taxonomy and discoverability
  • Build centralized dashboards, reliability scorecards, and runbooks used by engineering teams and leadership
  • Establish engineering practice maturity baselines and partner with teams on measurable improvement plans
  • Create golden paths—standardized pipelines, infrastructure modules, and service templates—that enable rapid, consistent delivery
  • Pioneer the use of AI and agentic solutions to automate toil, accelerate incident response, and enhance operational workflows
  • Lead internal workshops, game days, and learning programs to spread operational excellence
  • Act as a trusted advisor to product and engineering leadership, providing data-driven insights on reliability risk and trade-offs
  • Guide post-incident reviews toward systemic remediation (guardrails, automation, design changes) rather than superficial fixes
  • Design and extend self-service platforms for deployment, progressive delivery, and automated recovery
  • Reduce MTTR through better telemetry, automation, AI-assisted diagnostics, and resilience patterns
  • Mentor engineers across teams to become local reliability champions, scaling SRE impact without adding headcount.

Requirements

  • Experience programming in at least one of the following languages: Python, Typescript, or Java
  • Bachelor’s degree in a related discipline and 4 years’ experience in a related field
  • The right candidate could also have a different combination, such as a master’s degree and 2 years’ experience; a Ph.D. and up to 1 year of experience; or 16 years’ experience in a related field
  • Applicants must currently be authorized to work in the United States for any employer without current or future sponsorship
  • Expertise in designing, analyzing, and troubleshooting large-scale distributed systems
  • Deep hands-on experience with modern observability tools (CloudWatch and NewRelic)
  • Proven ability to assess engineering practices and drive measurable improvements across multiple teams
  • Experience establishing SLIs/SLOs, managing error budgets, and improving alert signal-to-noise ratios
  • Strong background in release engineering, CI/CD, and progressive deployment strategies
  • Deep expertise in AWS, Terraform, AWS CDK, and GitHub/GitHub Actions
  • Enthusiasm for applying AI, LLMs, and agentic automation to operational and reliability challenges
  • Track record reducing MTTR and improving availability through automation and architectural improvements
  • Excellent written and verbal communication skills tailored to both engineers and executives
  • Systematic problem-solving approach with a sense of drive and ownership
  • Understanding of Linux operating systems, networking, and performance fundamentals
  • Ability to build trust and influence decisions through data-driven insights
  • Experience facilitating effective post-incident analysis and driving systemic remediation.
  • Desire to work in a fast-paced, evolving, growing, dynamic environment.

Benefits

  • The Company offers eligible employees the flexibility to take as much vacation with pay as they deem consistent with their duties, the company’s needs, and its obligations
  • seven paid holidays throughout the calendar year
  • up to 160 hours of paid wellness annually for their own wellness or that of family members
  • additional paid time off in the form of bereavement leave, time off to vote, jury duty leave, volunteer time off, military leave, and parental leave
  • health care insurance (medical, dental, vision)
  • retirement planning (401(k))
  • paid days off (sick leave, parental leave, flexible vacation/wellness days, and/or PTO)

Job title

Site Reliability Engineer II

Job type

Experience level

Mid levelSenior

Salary

$89,400 - $134,000 per year

Degree requirement

Bachelor's Degree

Location requirements

Report this job

See something inaccurate? Let us know and we'll update the listing.

Report job