Senior Site Reliability Engineer ensuring reliability, scalability, and performance of services at Granicus. Leading automation processes and implementing best practices in site reliability engineering.
Responsibilities
Provide production support on a shift according to the team on-call roster.
Work on SRE projects and Tech support escalated and internal engineering/implementation team raised tickets.
Monitor and Maintain Systems.
Respond to alerts and incidents promptly to ensure high availability.
Actively participate in troubleshooting and resolving incidents, performing root cause analysis, Incident post mortems and implementing long-term fixes to prevent recurrence.
Develop and maintain automation scripts and tools to streamline operations and reduce manual intervention.
Partner closely with DevOps and Software Engineering teams to enhance system reliability.
Create and maintain documentation for technology, architecture, processes, procedures, and troubleshooting guides.
Requirements
Atleast 8+ years of relavant experience in site reliability engineering with a proven track record of managing complex, medium to large scale high-availability systems.
Expertise in Monitoring/Observability - Elastic & Cloud watch/Azure Monitor
Expertise in Linux/Windows OS & networking
Advanced knowledge of Cloud services (AWS & Azure)
Advanced knowledge of Container Technologies - Dockers & Kubernetes (K8s)
Proficiency on Database/Queries - MSSQL,Postgres,Mongodb,Mysql
Proficiency in Scripting - Python/Powershell / Bash
Working experience on CI/CD Tools - Gitlab/Azure Devops or similar tools
Working experience on IaC Tools -Terraform/Ansible
Working experience on Configuration management -Chef
Working experience on Incident response - Pagerduty, Jira
Relevant certifications such as Elastic Certified Observability Engineer, AWS Certified Solutions Architect, Certified Kubernetes Administrator, or those with Equivalent hands-on experience is highly valued.
Benefits
Employee Resource Groups to encourage diverse voices
Coffee with Mark sessions
Microsoft Teams communities focused on wellness, art, furbabies, family, parenting, and more
DevOps Engineer for designing and maintaining Azure - based hybrid cloud infrastructure for a company specializing in nature - based smart city solutions. Leading cloud architecture and mentoring engineers as part of a high - impact team.
SRE responsible for ensuring reliability and performance of IT systems at a digital transformation company specializing in public sector efficiency. Collaborating on system health, incident response, and automation tasks.
DevOps Senior role at Beyond Soluções managing CI/CD for .NET and Kubernetes applications. Collaborating on cloud solutions while fostering a culture of innovation and quality.
Senior Software Engineer at PayPal managing cloud infrastructure and DevOps solutions. Delivering complete SDLC solutions and guiding engineering teams for scalable and reliable services.
Senior Site Reliability Engineer at Diligent leading reliability, automation, and observability across cloud infrastructure. Build tools for incident response and enhance performance in fast - paced environments.
Perception Deployment Engineer deploying deep learning models on embedded systems at Caterpillar. Collaborating with cross - functional teams for integration and optimization of perception modules in vehicles.
Principal Site Reliability Engineer at AT&T required to design scalable solutions for critical operations with minimal downtime. Collaborating with teams to monitor and improve system performance in cloud environments.
DevOps Engineer managing AI SaaS infrastructure at a high - growth European company. Supporting AI model deployment and ensuring platform security and compliance with multiple systems integration.
Engineering Manager leading teams for observability platforms at LexisNexis. Owns operational excellence across software delivery lifecycle in Raleigh, NC.
Reliability Engineer optimizing site facility infrastructure and utility systems at Roche. Conducting root cause analyses and developing maintenance plans to enhance reliability and efficiency.