Hybrid Staff Software Engineer, Incident Management

Posted last month

Apply now

About the role

  • Staff Software Engineer focused on incident management to improve system reliability at Insulet. Collaborating with Incident Managers and teams to automate detection and response processes.

Responsibilities

  • Driving the incident management process and coordinating efforts with all teams involved, including SRE, R&D, IT, vendors, and stakeholder, in resolving the incident
  • Responding to incidents and initiating the incident management process
  • Prioritizing incidents according to their urgency and business impact
  • Coordinating response efforts and collaborating with the incident response team to ensure that all protocols are diligently followed
  • Communicating with internal stakeholders on major incidents and impacts
  • Producing documents that outline incident timelines and actions taken during the incident
  • Coordinating post-incident RCAs with responders and SMEs and communicating to stakeholders
  • Design and implement automation for incident detection, triage, and resolution
  • Develop and maintain runbooks, playbooks, and tooling to streamline incident response
  • Collaborate with Incident Managers to improve processes and reduce Mean Time to Recovery (MTTR)
  • Participate in major incident response efforts, providing technical leadership during high-severity events
  • Lead post-incident reviews and implement preventive measures to avoid recurrence

Requirements

  • Bachelor’s degree required (preferred field of study: Computer Science, Engineering, or related field)
  • 7+ years of experience in software engineering, operations, or reliability roles
  • Minimum 3+ years focused on incident management or operational resilience
  • Proven track record of improving incident response processes and reducing MTTR
  • Proven experience architecting and managing highly available, scalable, and fault-tolerant systems
  • Strong understanding of cloud computing platforms (e.g., AWS, Azure, GCP) and container orchestration technologies (e.g., Kubernetes)
  • Strong understanding of incident management principles and frameworks (e.g., ITIL)
  • Hands-on experience with incident response in complex, distributed systems
  • Proficiency in scripting or automation (Python, Bash, or similar) for operational tasks
  • Familiarity with monitoring and alerting tools (e.g., Datadog, Prometheus, Grafana)

Benefits

  • Medical, dental, and vision insurance
  • 401(k) with company match
  • Paid time off (PTO)
  • And additional employee wellness programs

Job title

Staff Software Engineer, Incident Management

Job type

Experience level

Lead

Salary

$148,200 - $222,300 per year

Degree requirement

Bachelor's Degree

Location requirements

Report this job

See something inaccurate? Let us know and we'll update the listing.

Report job