Hybrid Staff Site Reliability Engineer – Observability

Posted 2 weeks ago

Apply now

About the role

  • Lead the design, implementation, and optimization of observability systems
  • Collaborate with cross-functional teams to build robust monitoring, alerting, and telemetry solutions
  • Drive best practices, mentor others, and shape the strategic evolution of our observability ecosystem
  • Design and implement comprehensive observability solutions tailored for edge computing environments
  • Define and maintain Service Level Indicators (SLIs), Service Level Objectives (SLOs), and business KPIs
  • Build and optimize dashboards, visualizations, and alerting systems
  • Implement distributed tracing and log aggregation systems
  • Collaborate with engineering teams to ensure applications and infrastructure at edge locations are designed with observability in mind
  • Drive proactive identification of issues in edge facilities
  • Lead incident postmortems and implement observability-driven improvements
  • Develop and maintain tools, scripts, and automation to enhance observability pipelines
  • Evaluate and integrate industry-standard observability tools

Requirements

  • 7+ years of experience in Site Reliability Engineering, Observability Engineering, or a related field
  • 5+ years of experience with observability tools and platforms such as Prometheus, Grafana, Splunk, ELK, OpenTelemetry, or similar
  • 3+ years of experience with microservices, containerized environments (e.g., Kubernetes, Docker), and distributed systems, particularly in edge deployments
  • Experience with implementation of AIOps
  • Strong proficiency in programming/scripting languages (e.g., Python, java) for automation and tooling in distributed environments
  • Certifications in cloud platforms (Google Cloud Professional certification) or Kubernetes
  • Knowledge of incident management processes and tools (e.g., ServiceNow, xMatters, Opsgenie) tailored for distributed systems

Benefits

  • Affordable medical plan options
  • 401(k) plan (including matching company contributions)
  • Employee stock purchase plan
  • No-cost programs including wellness screenings, tobacco cessation and weight management programs, confidential counseling and financial coaching
  • Paid time off
  • Flexible work schedules
  • Family leave
  • Dependent care resources
  • Colleague assistance programs
  • Tuition assistance
  • Retiree medical access

Job title

Staff Site Reliability Engineer – Observability

Job type

Experience level

Lead

Salary

$118,450 - $284,280 per year

Degree requirement

Bachelor's Degree

Location requirements

Report this job

See something inaccurate? Let us know and we'll update the listing.

Report job