Senior Site Reliability Engineer ensuring reliability, scalability, and performance of services at Granicus. Leading automation processes and implementing best practices in site reliability engineering.
Responsibilities
Provide production support on a shift according to the team on-call roster.
Work on SRE projects and Tech support escalated and internal engineering/implementation team raised tickets.
Monitor and Maintain Systems.
Respond to alerts and incidents promptly to ensure high availability.
Actively participate in troubleshooting and resolving incidents, performing root cause analysis, Incident post mortems and implementing long-term fixes to prevent recurrence.
Develop and maintain automation scripts and tools to streamline operations and reduce manual intervention.
Partner closely with DevOps and Software Engineering teams to enhance system reliability.
Create and maintain documentation for technology, architecture, processes, procedures, and troubleshooting guides.
Requirements
Atleast 8+ years of relavant experience in site reliability engineering with a proven track record of managing complex, medium to large scale high-availability systems.
Expertise in Monitoring/Observability - Elastic & Cloud watch/Azure Monitor
Expertise in Linux/Windows OS & networking
Advanced knowledge of Cloud services (AWS & Azure)
Advanced knowledge of Container Technologies - Dockers & Kubernetes (K8s)
Proficiency on Database/Queries - MSSQL,Postgres,Mongodb,Mysql
Proficiency in Scripting - Python/Powershell / Bash
Working experience on CI/CD Tools - Gitlab/Azure Devops or similar tools
Working experience on IaC Tools -Terraform/Ansible
Working experience on Configuration management -Chef
Working experience on Incident response - Pagerduty, Jira
Relevant certifications such as Elastic Certified Observability Engineer, AWS Certified Solutions Architect, Certified Kubernetes Administrator, or those with Equivalent hands-on experience is highly valued.
Benefits
Employee Resource Groups to encourage diverse voices
Coffee with Mark sessions
Microsoft Teams communities focused on wellness, art, furbabies, family, parenting, and more
Senior DevOps Engineer responsible for cloud infrastructure and deployments. Optimizing AWS services and ensuring system security and reliability for Verizon.
Senior DevOps Engineer responsible for automating infrastructure and building CI/CD pipelines for collaborative robotics company. Collaborating with global engineering teams from the Bangalore office.
Site Reliability Engineer Intern at Tencent working on gaming services and cloud native solutions. Collaborating with global teams to eliminate toil and enhance reliability.
Cloud/DevOps Specialist at N5X managing and optimizing critical cloud infrastructures for Brazilian energy trading. Collaborating with a multidisciplinary team to ensure high availability and performance.
Cloud/Devops Specialist responsible for designing a hybrid architecture combining cloud and on - premises infrastructure for energy trading systems. Collaborating with a multidisciplinary team in a dynamic environment.
Reliability Engineering Specialist utilizing reliability tools and models to improve asset performance at Enbridge. Collaborating across teams to guide investment decisions for safe operations.
DevOps Engineer responsible for structuring and supporting cloud DevOps architecture in Brazil. Working strategically on automation and CI/CD practices with development teams in Pernambuco.
DevSecOps Software Engineer developing secure CI/CD pipelines for Boeing's military software systems. Collaborate with cross - functional teams and implement automation and security best practices.
DevOps Manager responsible for managing a team for multi - cloud solutions supporting the USAF Cloud One project. Focus on scalable cloud - native solutions and CI/CD practices.
Lead Site Reliability Engineer overseeing SRE practices across Azure and GCP platforms. Driving reliability improvements and leading a team at Lloyds Banking Group.