Site Reliability Engineer II enhancing operational excellence within Cox Automotive's SRE team. Focused on improving reliability and observability across multiple teams using advanced technologies.
Responsibilities
Define and drive adoption of SLIs, SLOs, error budgets, and high-quality alerting standards across the organization
Architect end-to-end observability strategies (metrics, logs, traces, business signals) with consistent taxonomy and discoverability
Build centralized dashboards, reliability scorecards, and runbooks used by engineering teams and leadership
Establish engineering practice maturity baselines and partner with teams on measurable improvement plans
Create golden paths—standardized pipelines, infrastructure modules, and service templates—that enable rapid, consistent delivery
Pioneer the use of AI and agentic solutions to automate toil, accelerate incident response, and enhance operational workflows
Lead internal workshops, game days, and learning programs to spread operational excellence
Act as a trusted advisor to product and engineering leadership, providing data-driven insights on reliability risk and trade-offs
Guide post-incident reviews toward systemic remediation (guardrails, automation, design changes) rather than superficial fixes
Design and extend self-service platforms for deployment, progressive delivery, and automated recovery
Reduce MTTR through better telemetry, automation, AI-assisted diagnostics, and resilience patterns
Mentor engineers across teams to become local reliability champions, scaling SRE impact without adding headcount.
Requirements
Experience programming in at least one of the following languages: Python, Typescript, or Java
Bachelor’s degree in a related discipline and 4 years’ experience in a related field
The right candidate could also have a different combination, such as a master’s degree and 2 years’ experience; a Ph.D. and up to 1 year of experience; or 16 years’ experience in a related field
Applicants must currently be authorized to work in the United States for any employer without current or future sponsorship
Expertise in designing, analyzing, and troubleshooting large-scale distributed systems
Deep hands-on experience with modern observability tools (CloudWatch and NewRelic)
Proven ability to assess engineering practices and drive measurable improvements across multiple teams
Strong background in release engineering, CI/CD, and progressive deployment strategies
Deep expertise in AWS, Terraform, AWS CDK, and GitHub/GitHub Actions
Enthusiasm for applying AI, LLMs, and agentic automation to operational and reliability challenges
Track record reducing MTTR and improving availability through automation and architectural improvements
Excellent written and verbal communication skills tailored to both engineers and executives
Systematic problem-solving approach with a sense of drive and ownership
Understanding of Linux operating systems, networking, and performance fundamentals
Ability to build trust and influence decisions through data-driven insights
Experience facilitating effective post-incident analysis and driving systemic remediation.
Desire to work in a fast-paced, evolving, growing, dynamic environment.
Benefits
The Company offers eligible employees the flexibility to take as much vacation with pay as they deem consistent with their duties, the company’s needs, and its obligations
seven paid holidays throughout the calendar year
up to 160 hours of paid wellness annually for their own wellness or that of family members
additional paid time off in the form of bereavement leave, time off to vote, jury duty leave, volunteer time off, military leave, and parental leave
health care insurance (medical, dental, vision)
retirement planning (401(k))
paid days off (sick leave, parental leave, flexible vacation/wellness days, and/or PTO)
DevOps Engineer managing AWS infrastructure and CI/CD processes for Sensorfact's smart monitoring platform. Collaborating with development teams to optimize energy efficiency in a modern cloud architecture.
DevOps Integrator responsible for deploying software applications and managing infrastructure at RATP. Engaging in CI/CD processes and collaborating with internal teams for digital solutions.
Senior DevOps Engineer at Codefy, a tech startup from Heidelberg, optimizing cloud and on - prem infrastructures. Collaborating closely with developers and stakeholders, ensuring stability and performance.
DevOps Engineer optimizing processes and infrastructure for Redtree's innovative energy solutions. Collaborate with a tech team to enhance stability, performance, and security in a hybrid role.
Senior DevOps Analyst at N5X responsible for cloud architecture and fintech solutions in energy market. Collaborating with teams to implement observability and manage cloud costs.
DevOps Engineer at Booz Allen developing, managing, and securing container platforms using cloud technologies. Collaborating with teams to inform strategy and assist clients in container management.
Senior DevOps Engineer supporting Agile development teams building and deploying applications across various environments. Focused on secure, reliable, and automated software delivery.
Enterprise Hybrid DevOps Engineer at KBR focusing on hybrid DevOps practices in Linux and cloud environments. Collaborating with engineering teams to enhance enterprise platform capabilities.
Senior DevOps Engineer in charge of building and operating Cloud Platform CI/CD capabilities. Working closely with SRE and engineering teams to ensure reliable services.
CI/CD Engineer responsible for the development and operation of CI/CD infrastructure at BG - Phoenics. Collaborating with teams to ensure performance and stability of digital solutions.