Site Reliability Engineer focused on incident management and system resiliency for Enova International. Leading incident responses and driving continuous improvement in system reliability with analytics.
Responsibilities
Lead production incidents as part of our PI PIC (or Incident Commander) rotation after completing training, ensuring clear communication and resolution.
Capture and maintain detailed documentation of incidents, contributing factors, and learnings in formal incident reports.
Deliver documentation that is clear, comprehensive, and accessible to different types of audiences in a timely manner within the established SLAs.
Facilitate and document blameless post-incident reviews that promote learning and continuous improvement.
Collect and analyze incident data to identify systemic issues, risks, and trends.
Work on improvements to how we collect, analyze, and learn from system failures.
Requirements
2+ years experience in a technology or analyst role (e.g., Software Engineering, Systems, Operations, SRE, or Product).
A strong interest in how complex distributed systems operate—and how to make them more reliable.
Analytical and problem-solving skills with a systems-thinking mindset.
Strong communication skills, both verbal and written, with the ability to tailor messaging to technical and non-technical audiences.
Comfort with ambiguity, and the ability to turn vague problems into actionable insights.
Demonstrated maturity, sound judgment, and organizational awareness.
Ability to coordinate the resolution of major incidents and post-incident reviews following Enova’s Incident Management Process
Ability to seamlessly shift between high-urgency incident response and structured project work, with strong organizational skills and the capacity to manage projects independently.
Benefits
Health, dental, and vision insurance including mental health benefits
401(k) matching plus a roth option (U.S. Based employees only)
PTO & paid holidays off
Sabbatical program (for eligible roles)
Summer hours (for eligible roles)
Paid parental leave
DEI groups (B.L.A.C.K. @ Enova, HOLA @ Enova, Women @ Enova, Pride @ Enova, South Asians @ Enova, APEX @ Enova, and Parents @ Enova)
Employee recognition and rewards program
Charitable matching and a paid volunteer day…Plus so much more!
Job title
Site Reliability Engineer – Incident Management, Resiliency
SRE Technical Manager leading reliability engineering teams ensuring performance for Navy IT services. Manage teams, collaborate on automation, and drive continuous improvement in a critical systems environment.
DevOps Engineer responsible for optimizing and securing cloud deployment processes at Axi. Collaborating across technology teams to promote best practices in DevOps methodologies.
Azure Cloud Engineer ensuring safe and scalable cloud environment at Schoologica while contributing to innovative educational solutions with modern cloud technologies.
DevSecOps Engineer responsible for enhancing Thales' secure hosting platforms in public and private clouds. Collaborating with teams to apply modern practices and build resilient infrastructures.
Develops high - automation services in Golang or Java within AWS, Kubernetes, and Azure. Supports teams in building secure applications while working in a hybrid environment.
DevOps Engineer specializing in AWS Cloud Infrastructure in a hybrid position. Collaborating within a supportive team to build modern infrastructure for VM - based applications.
Leading DevOps platform strategy for KIPMI Software's next - generation digital trust products. Collaborating with teams to implement scalable infrastructure and DevSecOps practices.
Join our DevOps team to build and manage GitHub pipelines and cloud - native Azure solutions. Collaborate with teams to drive DevOps best practices and optimize deployments.
Site Reliability Engineer enhancing system reliability and deployment practices at OpenLoop. Collaborating with cross - functional teams for incident management and performance tuning.