Hybrid Site Reliability Engineer, GovCloud Incident Response

Posted last week

Apply now

About the role

  • Site Reliability Engineer at Salesforce maintaining high uptime for cloud services. Collaborating with teams to automate issue resolution and improve operational excellence.

Responsibilities

  • Ensure 99.99% uptime for customer-facing services by proactively monitoring and maintaining the health of supporting systems, contributing directly to customer satisfaction and trust.
  • Act in key support roles during major incidents (e.g., Sev0, Sev1) and participate in technical incident reviews for problem management.
  • Contribute to Problem Management by populating and participating in Root Cause Analyses (RCAs) and handing them off to the Global Solutions team.
  • Ensure all work carried out by the Site Reliability team aligns with the company’s internal compliance policies and directives.
  • Collaborate with technical staff to solve complex technical issues and customer concerns.
  • Lead and mentor other team members in staying abreast of industry innovations and technologies, and assist in team development growth.
  • Thrive in a fast-paced environment, solving sophisticated issues quickly and successfully balancing multiple priorities.
  • Automate the detection and resolution of recurring issues in the production environment.
  • Help create and improve current processes to reduce operational and engineering toil, including the implementation of AI-driven automation for routine tasks.

Requirements

  • Citizenship: U.S. citizen (U.S. born or naturalized) who does not hold dual citizenship.
  • Education: Bachelor’s degree in Computer Science, Engineering, Information Technology, or a related technical field.
  • Experience: Systems engineering experience in enterprise-scale internet service engineering or support role.
  • Technical Skills: Expertise in TCP/IP related technologies (networking protocols, network programming, etc.).
  • Expertise in CLI enterprise support of Unix variants (Linux/Solaris/BSD), with significant exposure to Red Hat Enterprise Linux and Solaris.
  • Strong understanding of monitoring security systems and administration.
  • Experience provisioning, operating, and running AWS/C2S based infrastructure and systems.
  • Proficiency in scripting with Python, Go, or other languages.
  • Communication: Strong written and oral communication skills.
  • Incident Management: Past experience in Incident Management and a good understanding of ITIL service operations.
  • Availability: Ability to participate in a 24/7 on-call rotation supporting large data center operations and be available for shift work.

Benefits

  • time off programs
  • medical
  • dental
  • vision
  • mental health support
  • paid parental leave
  • life and disability insurance
  • 401(k)
  • employee stock purchasing program

Job title

Site Reliability Engineer, GovCloud Incident Response

Job type

Experience level

Mid levelSenior

Salary

$117,200 - $176,700 per year

Degree requirement

Bachelor's Degree

Location requirements

Report this job

See something inaccurate? Let us know and we'll update the listing.

Report job