About the role

  • Site Reliability Engineer supporting and maintaining essential service applications for national security missions. Aiming to enhance service availability and automate tasks.

Responsibilities

  • Supporting and maintaining essential service that support core mission applications, proactively enhancing their availability, performance and stability.
  • Being part of the 24/7 on call rota, supporting critical production systems out of business hours.
  • Finding innovative solutions to problems rather than undertaking repetitive work, automating everything you can.
  • You will design and deploy monitoring products, creating bespoke tools where required.

Requirements

  • Software development in web technologies and object oriented programming
  • Database technologies such as Oracle SQL, Mongo, Postgres
  • Know your way around Linux and Windows command lines, e.g. Bash and PowerShell
  • Monitoring large systems using technologies such as Grafana, Prometheus, ELK, Splunk
  • Experience of working in Agile teams, and the tooling that supports it, e.g. Atlassian
  • Diagnosing and troubleshooting application issues resulting in service outages
  • Troubleshooting skills across different levels of the stack
  • Understanding of ITIL
  • Micro-services architectures, Docker and container platforms such as Openshift, Kubernetes

Benefits

  • 24/7 on call support for critical production systems
  • Additional on call allowances and overtime benefits

Job title

Site Reliability Engineer

Job type

Experience level

Mid levelSenior

Salary

Not specified

Degree requirement

Bachelor's Degree

Location requirements

Report this job

See something inaccurate? Let us know and we'll update the listing.

Report job