Staff Software Engineer focused on incident management to improve system reliability at Insulet. Collaborating with Incident Managers and teams to automate detection and response processes.
Responsibilities
Driving the incident management process and coordinating efforts with all teams involved, including SRE, R&D, IT, vendors, and stakeholder, in resolving the incident
Responding to incidents and initiating the incident management process
Prioritizing incidents according to their urgency and business impact
Coordinating response efforts and collaborating with the incident response team to ensure that all protocols are diligently followed
Communicating with internal stakeholders on major incidents and impacts
Producing documents that outline incident timelines and actions taken during the incident
Coordinating post-incident RCAs with responders and SMEs and communicating to stakeholders
Design and implement automation for incident detection, triage, and resolution
Develop and maintain runbooks, playbooks, and tooling to streamline incident response
Collaborate with Incident Managers to improve processes and reduce Mean Time to Recovery (MTTR)
Participate in major incident response efforts, providing technical leadership during high-severity events
Lead post-incident reviews and implement preventive measures to avoid recurrence
Requirements
Bachelor’s degree required (preferred field of study: Computer Science, Engineering, or related field)
7+ years of experience in software engineering, operations, or reliability roles
Minimum 3+ years focused on incident management or operational resilience
Proven track record of improving incident response processes and reducing MTTR
Proven experience architecting and managing highly available, scalable, and fault-tolerant systems
Strong understanding of cloud computing platforms (e.g., AWS, Azure, GCP) and container orchestration technologies (e.g., Kubernetes)
Strong understanding of incident management principles and frameworks (e.g., ITIL)
Hands-on experience with incident response in complex, distributed systems
Proficiency in scripting or automation (Python, Bash, or similar) for operational tasks
Familiarity with monitoring and alerting tools (e.g., Datadog, Prometheus, Grafana)
Full Stack Engineer at Doxel developing project management tools using computer vision and AI for the construction industry. Collaborating with teams to build full - stack applications for massive data handling.
Senior Advanced Software Engineer at Quantinuum focused on documentation platform and data engineering for quantum computing users. Collaborating with multi - disciplinary teams to enhance user engagement and analytics.
Lead Full Stack Engineer developing generative UI capabilities for AI - driven experiences at Salesforce. Drive innovation across the enterprise ecosystem while mentoring engineering teams.
Product Engineer responsible for delivering high - quality solutions on NPD projects in a hybrid setup. Engaging across teams to ensure manufacturability and compliance in engineering processes.
Responsible for training coordination and product technical training at GROHE France. Engaging with team members and partners to enhance technical knowledge and product usage.
Senior Software Engineer developing backend systems for Bastion's stablecoin infrastructure. Leading projects and collaborating with cross - functional teams in a fast - paced startup environment.
Senior Fullstack Engineer at Bastion developing regulated stablecoin solutions for financial institutions. Responsible for end - to - end feature delivery and leading technical directions in a fast - paced environment.
Software Engineer 2 designing and building ingestion pipelines at WEX. Responsible for integrating data from various internal and external systems into scalable solutions.
Software Engineer developing software supporting integrated applications for Navy Combat Weapon System. Responsible for quality assurance, GUI development, and applying technical expertise in computer programming.