Onsite Principal Site Reliability Engineer, SRE

Posted yesterday

Apply now

About the role

  • Principal Site Reliability Engineer at AT&T required to design scalable solutions for critical operations with minimal downtime. Collaborating with teams to monitor and improve system performance in cloud environments.

Responsibilities

  • Translate core business requirements into robust, scalable, and reliable technical solutions.
  • Design and implement applications, platforms, and services that power critical business operations.
  • Ensure applications and systems are highly reliable, scalable, and performant.
  • Work with T1 team on incident as Triage lead during outages or critical issues.
  • Conduct detailed After Action Reviews involving all stakeholders and chalk out short term and long-term resiliency options.
  • Define and implement monitoring and alerting strategies tailored to the launch.
  • Collaborate with Product development teams to gain deep insight into the application architecture, flows and critical dependencies.
  • Monitor and evaluate key performance metrics like latency, throughput, and error rates and update alerts.
  • Propose architectural or operational changes to prevent reoccurrence.

Requirements

  • Bachelor’s degree in computer science, Information Systems, or a related discipline.
  • Over 10+ years hands-on experience in architecting and building scalable platforms and applications in cloud/data environments.
  • Expert level experience using Python, Java, Javascript, and Perl based solutions in a SRE role.
  • Practical understanding of AI/ML concepts and their integration in enterprise platforms.

Benefits

  • Medical/Dental/Vision coverage
  • 401(k) plan
  • Tuition reimbursement program
  • Paid Time Off and Holidays (based on date of hire, at least 23 days of vacation each year and 9 company-designated holidays)
  • Paid Parental Leave
  • Paid Caregiver Leave
  • Additional sick leave beyond what state and local law require may be available but is unprotected
  • Adoption Reimbursement
  • Disability Benefits (short term and long term)
  • Life and Accidental Death Insurance
  • Supplemental benefit programs: critical illness/accident hospital indemnity/group legal
  • Employee Assistance Programs (EAP)
  • Extensive employee wellness programs
  • Employee discounts up to 50% off on eligible AT&T mobility plans and accessories, AT&T internet (and fiber where available) and AT&T phone.

Job title

Principal Site Reliability Engineer, SRE

Job type

Experience level

Lead

Salary

$174,100 - $261,100 per year

Degree requirement

Bachelor's Degree

Location requirements

Report this job

See something inaccurate? Let us know and we'll update the listing.

Report job