Hybrid Site Reliability Engineer

Posted 4 days ago

Apply now

About the role

  • Senior Site Reliability Engineer ensuring reliability, scalability, and performance of services at Granicus. Leading automation processes and implementing best practices in site reliability engineering.

Responsibilities

  • Provide production support on a shift according to the team on-call roster.
  • Work on SRE projects and Tech support escalated and internal engineering/implementation team raised tickets.
  • Monitor and Maintain Systems.
  • Respond to alerts and incidents promptly to ensure high availability.
  • Implement effective alerting & notifications, minimizing false alerts.
  • Create and manage effective SRE Dashboards to report Key business metrics, SLAs, SLOs, SLIs & error budgets.
  • Proactively & effectively evaluates capacity planning to handle growth - scalability & traffic load.
  • Actively participate in troubleshooting and resolving incidents, performing root cause analysis, Incident post mortems and implementing long-term fixes to prevent recurrence.
  • Develop and maintain automation scripts and tools to streamline operations and reduce manual intervention.
  • Partner closely with DevOps and Software Engineering teams to enhance system reliability.
  • Create and maintain documentation for technology, architecture, processes, procedures, and troubleshooting guides.

Requirements

  • Atleast 8+ years of relavant experience in site reliability engineering with a proven track record of managing complex, medium to large scale high-availability systems.
  • Expertise in Monitoring/Observability - Elastic & Cloud watch/Azure Monitor
  • Expertise in Linux/Windows OS & networking
  • Advanced knowledge of Cloud services (AWS & Azure)
  • Advanced knowledge of Container Technologies - Dockers & Kubernetes (K8s)
  • Proficiency on Database/Queries - MSSQL,Postgres,Mongodb,Mysql
  • Proficiency in Scripting - Python/Powershell / Bash
  • Working experience on CI/CD Tools - Gitlab/Azure Devops or similar tools
  • Working experience on IaC Tools -Terraform/Ansible
  • Working experience on Configuration management -Chef
  • Working experience on Incident response - Pagerduty, Jira
  • Relevant certifications such as Elastic Certified Observability Engineer, AWS Certified Solutions Architect, Certified Kubernetes Administrator, or those with Equivalent hands-on experience is highly valued.

Benefits

  • Employee Resource Groups to encourage diverse voices
  • Coffee with Mark sessions
  • Microsoft Teams communities focused on wellness, art, furbabies, family, parenting, and more

Job title

Site Reliability Engineer

Job type

Experience level

SeniorLead

Salary

Not specified

Degree requirement

Bachelor's Degree

Location requirements

Report this job

See something inaccurate? Let us know and we'll update the listing.

Report job