Hybrid Lead Site Reliability Engineer

Posted 2 hours ago

Apply now

About the role

  • Lead Site Reliability Engineer for Personio, shaping the future of HR technology through reliable infrastructure and collaborative engineering.

Responsibilities

  • Engage in and improve the full service lifecycle from initial design through deployment, operation, and continuous improvement.
  • Prepare services for production by taking part in system design reviews, developing shared frameworks and platforms, planning capacity and conducting launch assessments.
  • Operate, monitor, and maintain live services, designing observability stacks and dashboards to track key metrics and improve operational insight.
  • Ensure sustainable scalability through automation, actively contributing to continuous improvement for reliability and delivery speed.
  • Collaborate with product and engineering teams to define SLOs, error budgets and ensure services are reliable, scalable and observable.
  • Support incident management processes, including on-call rotations, assisting with outage response, and contributing to post-mortems and root cause analysis.
  • Identify and reduce toil through process automation, creating playbooks and automated runbooks to reduce MTTR.
  • Support resilience strategies and help implement chaos testing to proactively uncover weaknesses and validate recovery strategies.
  • Mentor and train peers on reliability best practices and tooling, contributing to community growth.

Requirements

  • Bachelor’s degree in Computer Science, a related field, or equivalent practical experience.
  • 6+ years of experience with SaaS software development in distributed systems using languages such as Kotlin/Java, Typescript, Python, and technologies like IaC, Docker, and Kubernetes.
  • 2+ years’ experience as an SRE or similar role designing, operating, analyzing and troubleshooting distributed systems in agile environments.
  • Act as a Datadog subject matter expert, assisting with observability stack design, dashboard creation, and training peers in best practices.
  • Systematic problem solving and debugging skills with a strong sense of ownership and bias towards establishing mechanisms which can scale across the entire company.
  • Excellent written, verbal, and documentation skills.
  • Collaborative team player, able to communicate effectively across disciplines.

Benefits

  • Receive a competitive reward package – reevaluated each year – that includes salary, benefits, and pre-IPO equity.
  • Enjoy 28 days of paid vacation, plus an additional day after 2 and 4 years.
  • Make an impact on the environment and society with 1 (fully paid) Impact Day.
  • Receive generous family leave, child support, mental health support, and sabbatical opportunities.
  • We enjoy gathering for meals, cultural initiatives, and events like local Summer Sessions and year-end celebrations. There's also healthy snacks, drinks, and a weekly catered lunch.

Job title

Lead Site Reliability Engineer

Job type

Experience level

Senior

Salary

Not specified

Degree requirement

Bachelor's Degree

Location requirements

Report this job

See something inaccurate? Let us know and we'll update the listing.

Report job