Hybrid Site Reliability Engineer – Staff

Posted last month

Apply now

About the role

  • Deliver projects on time: Plan, delegate, execute, and oversee key projects;
  • Collaborate: Work closely with stakeholders and other teams. Mentor colleagues and lead knowledge transfer;
  • Ensure quality and reduce technical debt: Deliver solutions with solid design and address blockers, toil, and debt to keep systems healthy;
  • Drive engineering excellence: Aim for quality and choose the right solution for the problems we face;
  • Protect solution quality: Ensure designs are implemented with proper quality and minimal tech debt;
  • Data‑backed decisions: Help teams and stakeholders navigate data and act on insights;
  • Design and maintain highly available, scalable infrastructure with monitoring, alerting, and anomaly detection;
  • Automate everything: Create and optimize automation to streamline deployments, improve speed, and cut manual work;
  • Solve complex issues: Troubleshoot, debug, and resolve critical issues in complex systems;
  • Use AI: Integrate AI into workflows and processes to speed up delivery and reduce toil.

Requirements

  • Observability: Experience with monitoring tools and frameworks to ensure system observability (OpenSearch, VictoriaMetrics, Prometheus, Thanos, Mimir, OpenTelemetry, Nagios);
  • Databases and storage systems: Experience operating highly available SQL, NoSQL databases, and object stores at scale (MySQL, Percona, PostgreSQL, Cassandra, ClickHouse, Timescale, Druid, MinIO);
  • Data visualization: Ability to build meaningful dashboards that show the right insights (Grafana, OpenSearch Dashboards);
  • Alerting and anomaly detection: Ability to build anomaly detection and alerting pipelines;
  • Programming: Proficiency in one or more programming languages for automation scripts and integrations (Python, Go, Rust, C);
  • Linux: Strong knowledge of Linux systems, especially Debian‑based distributions;
  • Workflow: Ability to use workflow automation frameworks (Airflow, Prefect, n8n);
  • Configuration management: Ability to design and develop configuration management codebases and deployment pipelines (SaltStack, Ansible, Rundeck);
  • Networking: Strong understanding of networking protocols and concepts (Overlay, VPN, Proxy, DNS, HTTP, SSL, TCP, UDP);
  • Security: Ability to design secure systems and working knowledge of security concepts and tools (Vault, PKI, mTLS).

Benefits

  • Innovate with industry leaders
  • Learn & grow
  • Hybrid work
  • Work from anywhere
  • Physical well-being
  • Mental & emotional health
  • Joyful moments – special treats
  • Company events & team-building
  • Workation

Job title

Site Reliability Engineer – Staff

Job type

Experience level

Lead

Salary

Not specified

Degree requirement

No Education Requirement

Location requirements

Report this job

See something inaccurate? Let us know and we'll update the listing.

Report job