Site Reliability Engineer ensuring smooth operations for banking systems at GFT. Working on production system access, deployment, and observability in AWS and Kubernetes environments.
Responsibilities
Participate in on-call rotations to provide support for critical systems.
Engineers are required to work on a rotating 2-2-2 schedule: 2 morning shifts followed by 2 days off, 2 afternoon shifts followed by 2 days off, and 2 night shifts followed by 2 days off.
Morning: 09:00 AM - 06:00 PM
Afternoon: 05:00 PM - 02:00 AM
Night: 01:00 AM - 10:00 AM
Resolve system incident when occurs
Deployment of changes into staging and production environments.
Work with Platform Engineers to understand the changes.
Develop deployment pipeline for changes.
Understand the changes and develop observability (monitoring and alert) according to the changes.
Develop and conduct resiliency testing solution.
Continuous enhancement of monitoring solution.
Create and update operation runbooks.
Automate operation runbooks.
Requirements
Strong experience with Amazon Web Services
Strong experience and understanding of Kubernetes system
Scripting skills with Python or Bash
Experience in continuous deployment tools Harness (good to have)
Experience in infrastructure as code (IaC) tools Terraform
Experience with observability solutions Prometheus & Grafana SumoLogic (good to have)
DevOps Engineer ensuring stability, scalability, and reliability of justtrack's SaaS platform. Collaborate with development teams, manage cloud infrastructure, and enhance CI/CD processes.
Cloud DevOps Engineer designing and optimizing secure cloud infrastructure on Azure. Collaborating closely with developers for reliable CI/CD processes on cloud - based products.
Staff Site Reliability Engineer responsible for cloud infrastructure implementation and reliability improvements at Auror. Collaborating with engineering teams to enhance production code understanding.
Own availability and strive for operational excellence of Sumo Logic’s observability. Collaborate with global SRE team to optimize operations and improve developer velocity.
Senior Executive supporting technology initiatives in Pune, India. Collaborating globally to connect people and solve complex challenges in a sustainable manner.
DevOps Engineer leading the design, implementation, and optimisation of Kubernetes platforms for Vodafone. Collaborating with product teams to streamline operational processes and enhance developer experience.
Senior Site Reliability Engineer developing scalable systems and automation for high - scale projects at Euna Solutions. Collaborating closely with software developers and mentoring junior engineers.
Senior Site Reliability Engineer responsible for designing scalable systems at Euna Solutions. Collaborating with developers and mentoring juniors while driving automation and reliability.
Senior Site Reliability DevOps Specialist at Boeing overseeing GCP cloud environment and infrastructure. Ensuring reliability, scalability, and automation while collaborating with distributed teams.