Senior Site Reliability Engineer at Uniphore developing cloud infrastructure and Go services. Collaborating with teams to ensure operational excellence and reliability.
Responsibilities
Review RFCs and PRDs to prevent downstream issues; provide architectural guidance during planning phases, including API design and service contract review
Design and build internal Go services, CLIs, and automation pipelines that replace manual processes and eliminate support dependencies
Design incident response frameworks, escalation procedures, and comprehensive playbooks; build tooling that automates runbook steps and accelerates MTTR , and participate in our on-call program
Define technical standards and operational frameworks, then codify them as enforced policy through admission controllers, operator logic, pipeline gates , etc. Automate policy enforcement not just authoring documentation.
Guide teams through ownership maturity, scorecard compliance, and operational best practices
Requirements
5+ years in DevOps/SRE roles with a track record of transforming operational models
Production Go experience : you write Go regularly, understand its concurrency model, and are comfortable owning Go services in production
Kubernetes depth: operational expertise plus the ability to extend it: you understand the controller-runtime model and could write or maintain a Kubernetes Operator
Production Excellence : deep incident management, RCA processes, and on-call system design experience
Software engineering fundamentals : API design, testing, observability instrumentation, and service lifecycle ownership; you treat internal tooling with the same rigor as customer-facing software
Standards & Documentation : strong technical writing; able to create operational procedures that teams can self-execute
Architecture & Planning : RFC/PRD review experience; you catch operational problems at design time
Collaboration & Coaching : track record of enabling team capabilities through tooling and knowledge transfer, not just doing work for teams
SRE role at BT Group focusing on cloud reliability and operational excellence across engineering teams. Collaborate with product owners to implement SRE principles for improved service performance.
As Learning Content Engineer, developing and enhancing training content for Cloud and DevOps. Engaging in creating practical learning materials from basics to advanced topics.
AWS DevOps Microservices Engineer at Solventum designing secure and scalable AWS infrastructures. Collaborating with diverse teams for innovative healthcare solutions using cloud technology.
DevOps Engineer building and maintaining Catena’s scalable platform infrastructure. Collaborating with engineers to enhance CI/CD pipelines and support cloud - native workloads on AWS.
Platform System Reliability Engineer focused on operations of EKS Kubernetes environment for GE Vernova's SaaS grid products. Responsible for the full lifecycle of production clusters from performance tuning to securing infrastructure.
SRE Observability SLO Engineer for GE Vernova’s GridOS Platform Engineering team. Building telemetry stack in SaaS reliability for critical energy infrastructure.
DevOps Engineer responsible for building and operating automation services using Ansible for Rabobank. Collaborating with teams to ensure stable, secure, and auditable infrastructure across multiple servers.