Hybrid Site Reliability Engineer

Posted 3 hours ago

Apply now

About the role

  • Site Reliability Engineer enhancing platform reliability for AI workflows at WRITER. Overseeing automated solutions and cloud infrastructure supporting high-trafficked AI systems.

Responsibilities

  • Automate operational tasks and infrastructure management by developing robust tools and platforms using Python, Go, or similar languages, significantly reducing manual toil across our production environment
  • Design and implement scalable, fault-tolerant infrastructure solutions on public cloud providers (AWS, GCP, Azure) to support WRITER's rapidly expanding, high-traffic AI platform
  • Own the reliability, performance, and efficiency of WRITER’s core services, defining and upholding stringent Service Level Objectives (SLOs) and Error Budgets
  • Own the observability stack for monitoring, logging, and alerting systems to ensure rapid detection of issues across our complex distributed systems
  • Lead incident response, post-mortems, and root cause analyses, applying learnings to proactively prevent future outages and build a more resilient system architecture
  • Collaborate closely with product and engineering teams, providing expert guidance on system design for reliability, performance, and scalability from conception through launch

Requirements

  • A solid 7+ years of experience in site reliability engineering, DevOps, or a similar role focused on building and operating large-scale, high-availability production systems
  • Deep expertise with cloud platforms (AWS strongly preferred), containerization technologies like Docker and Kubernetes, and Infrastructure-as-Code tools such as Terraform
  • Strong proficiency in programming languages such as Python, Java, Go for automation and monitoring
  • Knowledge of monitoring and logging tools (e.g., Prometheus, Grafana, ELK Stack) to maintain system health and performance
  • Demonstrated ability to Challenge the status quo, proactively identify systemic weaknesses, and propose innovative solutions to complex reliability problems
  • Excellent communication, collaboration, and problem-solving skills, with a talent for building strong relationships and Connecting with cross-functional teams
  • A strong sense of ownership and accountability, eager to Own mission-critical systems and drive them toward peak performance and unparalleled reliability

Benefits

  • Generous PTO, plus company holidays
  • Medical, dental, and vision coverage for you and your family
  • Paid parental leave for all parents (12 weeks)
  • Fertility and family planning support
  • Early-detection cancer testing through Galleri
  • Flexible spending account and dependent FSA options
  • Health savings account for eligible plans with company contribution
  • Annual work-life stipends for:
  • Wellness stipend for gym, massage/chiropractor, personal training, etc.
  • Learning and development stipend
  • Company-wide off-sites and team off-sites
  • Competitive compensation, company stock options and 401k

Job title

Site Reliability Engineer

Job type

Experience level

SeniorLead

Salary

$157,700 - $277,800 per year

Degree requirement

Bachelor's Degree

Location requirements

Report this job

See something inaccurate? Let us know and we'll update the listing.

Report job