Senior Site Reliability Engineer ensuring reliability, scalability, and performance of services at Granicus. Leading automation processes and implementing best practices in site reliability engineering.
Responsibilities
Provide production support on a shift according to the team on-call roster.
Work on SRE projects and Tech support escalated and internal engineering/implementation team raised tickets.
Monitor and Maintain Systems.
Respond to alerts and incidents promptly to ensure high availability.
Actively participate in troubleshooting and resolving incidents, performing root cause analysis, Incident post mortems and implementing long-term fixes to prevent recurrence.
Develop and maintain automation scripts and tools to streamline operations and reduce manual intervention.
Partner closely with DevOps and Software Engineering teams to enhance system reliability.
Create and maintain documentation for technology, architecture, processes, procedures, and troubleshooting guides.
Requirements
Atleast 8+ years of relavant experience in site reliability engineering with a proven track record of managing complex, medium to large scale high-availability systems.
Expertise in Monitoring/Observability - Elastic & Cloud watch/Azure Monitor
Expertise in Linux/Windows OS & networking
Advanced knowledge of Cloud services (AWS & Azure)
Advanced knowledge of Container Technologies - Dockers & Kubernetes (K8s)
Proficiency on Database/Queries - MSSQL,Postgres,Mongodb,Mysql
Proficiency in Scripting - Python/Powershell / Bash
Working experience on CI/CD Tools - Gitlab/Azure Devops or similar tools
Working experience on IaC Tools -Terraform/Ansible
Working experience on Configuration management -Chef
Working experience on Incident response - Pagerduty, Jira
Relevant certifications such as Elastic Certified Observability Engineer, AWS Certified Solutions Architect, Certified Kubernetes Administrator, or those with Equivalent hands-on experience is highly valued.
Benefits
Employee Resource Groups to encourage diverse voices
Coffee with Mark sessions
Microsoft Teams communities focused on wellness, art, furbabies, family, parenting, and more
DevOps Engineer designing and operating AWS infrastructure within industrial IoT environments. Working on systems that ensure security, resilience, and end - to - end observability.
Sr. Site Reliability Engineer (SRE) III providing technical solutions for the federal government. Collaborating in a high - performing team focused on reliability and application scalability.
Senior Linux System Engineer developing and maintaining Linux server infrastructure for Th. Geyer GmbH. Collaborating on ERP systems and CI/CD processes while ensuring system performance and security.
Platform Engineer leading the development of cloud application platforms for Allstate. Responsible for cloud infrastructure for ML experimentation and production deployments.
Cloud Platform Engineer (ML DevOps) developing and managing CI/CD pipelines for ML workflows in a leading insurance company. Collaborating with data scientists and ensuring infrastructure security and compliance.
DevOps Engineer developing and managing container platforms for client solutions at Booz Allen Hamilton. Utilizing cloud technologies to enhance capabilities and secure deployments.
Senior DevOps/Platform Engineer automating cloud infrastructure and optimizing delivery pipelines at S&P Global Mobility. Collaborating with teams to enhance product reliability and security.
DevOps Engineer responsible for maintaining and enhancing AWS/EKS platform for energy transition products. Ensuring platform stability, security compliance, and streamlined deployment processes.
Suspension Design and Release Engineer for Ford, impacting vehicle ride, handling, and NVH. Collaborating with cross - functional teams to deliver quality systems and components.
DevOps Engineer at TeamViewer driving DevOps excellence by building CI/CD pipelines and managing Kubernetes. Collaborate within a diverse team to optimize digital processes with cloud infrastructure.