Senior Software Engineer building automation platforms for incident response at Cox Automotive. Focusing on AI-driven reliability solutions and engineering collaboration within the team.
Responsibilities
Build automation that reduces toil and empowers engineering teams
Create tools and platforms that help teams understand and improve their system reliability
Reimagine how we learn from incidents and turn insights into preventive measures
Experiment with new approaches to observability, monitoring, and alerting
Bring your engineering expertise to complex production challenges
Explore how AI can transform incident detection, triage, and response
Partner with teams across the organization to review & analyze incidents and solve reliability problems at scale
Drive technical conversations that shape how Cox Automotive builds resilient systems
Turn operational pain points into engineering opportunities
Define what modern incident response engineering looks like for our organization
Requirements
Professional experience with static languages (Java, C#, Go) and dynamic languages (Python, Ruby, JavaScript) and understand the tradeoffs of each
Distributed systems expertise and understanding of failure modes
Experience building internal platforms , developer tools, or automation that scales
Git/version control and CI/CD pipeline experience
Infrastructure as code and API design experience
Track record eliminating toil through intelligent automation
Production ownership experience (on-call, incident response, observability)
Systems thinking mindset —understanding how components interact at scale
Eager to dig into problems and bring proposed solutions to group discussion
Open to feedback and able to creatively adapt multiple ideas into solutions
Strong technical writing including high and low-level diagramming techniques
Analytical skills and careful attention to detail
Bachelor’s degree in a related discipline and 4 years’ experience in a related field
The right candidate could also have a different combination, such as a master’s degree and 2 years’ experience; a Ph.D. and up to 1 year of experience; or 16 years’ experience in a related field.
Benefits
The Company offers eligible employees the flexibility to take as much vacation with pay as they deem consistent with their duties, the company’s needs, and its obligations
Seven paid holidays throughout the calendar year
Up to 160 hours of paid wellness annually for their own wellness or that of family members
Additional paid time off in the form of bereavement leave, time off to vote, jury duty leave, volunteer time off, military leave, and parental leave
Health care insurance (medical, dental, vision)
Retirement planning (401(k))
Paid days off (sick leave, parental leave, flexible vacation/wellness days, and/or PTO)
Quality Engineer supporting new product launches and reliability testing for SSD at Micron in Malaysia. Responsible for coordinating test activities and conducting failure analysis.
Manager of Mechanical Engineering ensuring high - availability mechanical systems in data centers. Collaborating on lifecycle management and performance evaluation across missions - critical facilities in a hybrid role.
Reliability Engineer ensuring operational readiness of data centers at Rowan Digital Infrastructure. Overseeing commissioning, operational standards, and transitioning facilities into live operations.
DevOps Engineer developing reusable Ansible and Puppet modules and managing CI/CD for project teams. Join PLATH in Hamburg, focusing on crisis detection software development.
Senior DevOps Engineer designing and maintaining CI/CD pipelines for a leading connectivity firm. Collaborating with cross - functional teams to optimize cloud infrastructure and enhance operational excellence.
Mechanical Reliability Engineer at Cargill ensuring asset reliability through advanced maintenance practices. Collaborating with teams and overseeing projects in heavy industrial processes.
Sr. DevOps Engineer at AllTrails focused on enhancing infrastructure reliability and security. Collaborating with engineering teams and contributing to projects for system optimization.
Senior IT Analyst focusing on SRE for Itaú, the largest bank in Latin America. Ensuring reliability and performance of critical systems through automation and incident resolution.