Hybrid Staff Site Reliability Engineer – d/f/m

Posted yesterday

Apply now

About the role

  • Site Reliability Engineer at Personio focusing on automated infrastructure and collaboration across engineering teams. Shape the future of HR technology with meaningful impact and ownership.

Responsibilities

  • Engage in and improve the full service lifecycle from initial design through deployment, operation, and continuous improvement.
  • Prepare services for production by engaging in system design reviews, developing shared frameworks and platforms, planning capacity and conducting launch assessments.
  • Operate, monitor, and maintain live services, designing observability stacks and dashboards to track key metrics and improve operational insight.
  • Ensure sustainable scalability through automation, driving continuous evolution to increase reliability and delivery speed.
  • Collaborate with product and engineering teams to define SLOs, error budgets and ensure services are reliable, scalable and observable.
  • Lead incident management processes, including on-call rotations, managing outages, driving post-mortems and conducting root cause analysis.
  • Identify and reduce toil through process automation, creating playbooks and automated runbooks to reduce MTTR.
  • Define resilience strategies and implement chaos testing to proactively uncover weaknesses and validate recovery strategies.
  • Mentor, train and grow the community. Guide engineers across teams in reliability best practices and tooling.

Requirements

  • Bachelor’s degree in Computer Science, a related field, or equivalent practical experience.
  • 8+ years of experience with SaaS software development in distributed systems using languages such as Kotlin/Java, Typescript, Python, and technologies like IaC, Docker, and Kubernetes.
  • 2+ years’ experience as an SRE or similar role designing, operating, analyzing and troubleshooting distributed systems in agile environments.
  • Strong knowledge of modern application and infrastructure monitoring concepts (Datadog and/or AWS experience advantageous).
  • Systematic problem solving and debugging skills with a strong sense of ownership and bias towards establishing mechanisms which can scale across the entire company.
  • Excellent written, verbal, and documentation skills.
  • Collaborative team player, able to communicate effectively across disciplines.

Benefits

  • Receive a competitive reward package – reevaluated each year – that includes salary, benefits, and pre-IPO equity.
  • Enjoy 28 days of paid vacation, plus an additional day after 2 and 4 years.
  • Make an impact on the environment and society with 1 (fully paid) Impact Day.
  • Receive generous family leave, child support, mental health support, and sabbatical opportunities.
  • We enjoy gathering for meals, cultural initiatives, and events like local Summer Sessions and year-end celebrations. There's also healthy snacks, drinks, and a weekly catered lunch.

Job title

Staff Site Reliability Engineer – d/f/m

Job type

Experience level

Lead

Salary

Not specified

Degree requirement

Bachelor's Degree

Location requirements

Report this job

See something inaccurate? Let us know and we'll update the listing.

Report job