Senior Site Reliability Engineer ensuring reliability, scalability, and performance of services at Granicus. Leading automation processes and implementing best practices in site reliability engineering.
Responsibilities
Provide production support on a shift according to the team on-call roster.
Work on SRE projects and Tech support escalated and internal engineering/implementation team raised tickets.
Monitor and Maintain Systems.
Respond to alerts and incidents promptly to ensure high availability.
Actively participate in troubleshooting and resolving incidents, performing root cause analysis, Incident post mortems and implementing long-term fixes to prevent recurrence.
Develop and maintain automation scripts and tools to streamline operations and reduce manual intervention.
Partner closely with DevOps and Software Engineering teams to enhance system reliability.
Create and maintain documentation for technology, architecture, processes, procedures, and troubleshooting guides.
Requirements
Atleast 8+ years of relavant experience in site reliability engineering with a proven track record of managing complex, medium to large scale high-availability systems.
Expertise in Monitoring/Observability - Elastic & Cloud watch/Azure Monitor
Expertise in Linux/Windows OS & networking
Advanced knowledge of Cloud services (AWS & Azure)
Advanced knowledge of Container Technologies - Dockers & Kubernetes (K8s)
Proficiency on Database/Queries - MSSQL,Postgres,Mongodb,Mysql
Proficiency in Scripting - Python/Powershell / Bash
Working experience on CI/CD Tools - Gitlab/Azure Devops or similar tools
Working experience on IaC Tools -Terraform/Ansible
Working experience on Configuration management -Chef
Working experience on Incident response - Pagerduty, Jira
Relevant certifications such as Elastic Certified Observability Engineer, AWS Certified Solutions Architect, Certified Kubernetes Administrator, or those with Equivalent hands-on experience is highly valued.
Benefits
Employee Resource Groups to encourage diverse voices
Coffee with Mark sessions
Microsoft Teams communities focused on wellness, art, furbabies, family, parenting, and more
Network Technician 3 providing critical technical support for network and infrastructure development. Involves troubleshooting and preparing technical documentation for hardware and configurations.
DevOps Engineer supporting automation, CI/CD and infrastructure management at IP Fabric. Collaborate with teams to enhance practices and ensure smooth operation of services.
Directeur développement et exploitation des CNR responsible for managing operations and compliance at multiple sites in Pantin and Marcoussis. Leading strategic facility management and commercial development efforts.
Senior DevOps Engineer overseeing continuity of SaaS services for Safran Passenger Innovations. Collaborating on software applications and innovations in the in - flight entertainment ecosystem.
DevOps Engineer responsible for understanding requirements, implementing tools, and managing project activities. Focus on automation, security measures, and collaboration with stakeholders.
Student DevOps Engineer working on data and analytics for technology solutions at Sun Life. Collaborating with teams in a supportive environment to innovate and make an impact.
Senior Software Engineer building automation platforms for incident response at Cox Automotive. Focusing on AI - driven reliability solutions and engineering collaboration within the team.
Consultant DevOps Azure handling cloud infrastructures at Ozitem. Designing and maintaining Microsoft Azure solutions while collaborating with cross - functional teams.
Senior Reliability Engineer applying a variety of reliability techniques and managing projects at Baker Hughes. Collaborating with teams to meet customer expectations and enhance their success.