Hybrid Senior Site Reliability Engineer

Posted last month

Apply now

About the role

  • Site Reliability Engineer at PayPal ensuring high availability, performance, and scalability for critical systems. Leading reliability initiatives, incident management, and system performance optimization.

Responsibilities

  • Take ownership of system performance monitoring, identify inefficiencies, and lead initiatives to improve the overall availability and reliability of digital platforms and applications
  • Lead and manage the response to complex, high-priority incidents, ensuring prompt resolution and a thorough root cause analysis to prevent future occurrences
  • Design and implement advanced automation frameworks to improve operational efficiency, streamline processes, and reduce human error
  • Lead reliability-focused initiatives, ensuring systems are highly available, resilient, and scalable, and promote best practices across engineering teams
  • Enhance the monitoring infrastructure by identifying key metrics, optimizing alerting, and improving system observability to ensure the reliability of large-scale systems
  • Forecast resource requirements and lead capacity planning activities to ensure systems can scale effectively to meet growing user demand
  • Ensure robust disaster recovery strategies are in place and conduct regular testing to ensure systems can recover quickly from failures
  • Partner with engineering and product teams to identify opportunities for improving system architecture, focusing on scalability, reliability, and fault tolerance
  • Provide mentorship and technical guidance to junior site reliability engineers, fostering skill development and knowledge sharing
  • Drive continuous improvement across operational workflows, identifying areas for optimization, cost reduction, and performance enhancement.

Requirements

  • 3+ years in Cloud Infrastructure, Site Reliability Engineering (SRE), DevOps Engineering, or related fields
  • B.S. or M.S. degree in Computer Science, Engineering, or a related technical field, or equivalent experience may be considered in lieu of degree
  • At least 2+ years of hands-on experience deploying, managing, and optimizing containerized applications using GKE, and Harness in both public and private cloud environments (AWS, GCP, Azure, etc.), preferably Google Cloud Platform (GCP)
  • 2+ years of hands-on experience with Infrastructure-as-code (Terraform, CloudFormation), CI/CD pipelines (CircleCI, Harness, Jenkins, ArgoCD), and experience in Node, Python, or Go
  • Strong understanding of using Google Cloud Logging, DataDog, or other monitoring and observability tools
  • Ability to effectively diagnose and resolve performance bottlenecks within GCP at the infrastructure and application layers
  • Strong leadership abilities; must have customer focus and commitment to quality
  • Must have great interpersonal skills; solid communication skills, written and verbal
  • Ability to remain composed, methodical, and think fast in a high-pressure environment
  • Experience in managing, collaborating, and influencing global teams
  • Must be organized, detail-oriented, and able to manage multiple tasks simultaneously with the ability to appropriately prioritize.

Benefits

  • Flexible working environment
  • Employee shares options
  • Health and life insurance
  • Paid time off

Job title

Senior Site Reliability Engineer

Job type

Experience level

Senior

Salary

$111,500 - $191,950 per year

Degree requirement

Bachelor's Degree

Location requirements

Report this job

See something inaccurate? Let us know and we'll update the listing.

Report job