Part of the 24X7 operations group (working in shifts) managing an application or multiple applications
Monitor & remediate alerts and maintain uptime
Develops and maintains automated systems to improve operational efficiency and ensure compliance with security policies.
Executes automation and debugs issues as required.
Leverage CI/CD & Git Ops for managing the application platform
Patching security vulnerabilities
Manage public cloud infrastructure.
Shares and reviews innovative technical ideas with peers, high-level technical contributors, and managers.
Analyses incidents / problems to develop and implement solutions to complex application problems, system administration issues, or network concerns.
Requirements
Bachelor's degree in computer science, engineering, information systems, or closely related quantitative discipline.
Master’s desirable.
Typically, 3 - 5 years’ experience.
Strong Experience in Ubuntu & K8s platforms
Experience in programming skills in Scripting / Python / Golang / Ansible/ Terraform.
Strong experience in DevOps practices like continuous integration/continuous deployment (CI/CD).
Knowledge on Git Ops model
Working experience in cloud platforms, especially AWS
Ability to quickly learn new skills and technologies, and work well with other team members.
Strong system debugging skills
Knowledge on security-related activities like patching, CVE, etc
Good written and verbal communication skills.
Benefits
Health & Wellbeing We strive to provide our team members and their loved ones with a comprehensive suite of benefits that supports their physical, financial and emotional wellbeing.
Personal & Professional Development We also invest in your career because the better you are, the better we all are. We have specific programs catered to helping you reach any career goals you have — whether you want to become a knowledge expert in your field or apply your skills to another division.
Unconditional Inclusion We are unconditionally inclusive in the way we work and celebrate individual uniqueness. We know varied backgrounds are valued and succeed here. We have the flexibility to manage our work and personal needs.
Site Reliability Engineer responsible for monitoring and improving the reliability of satellite operations infrastructure. Collaborating with teams to automate processes in a dynamic environment.
DevOps Analyst providing high quality and reliable solutions within multifuncional teams at technology - focused financial organization. Automating build and deployment solutions in a hybrid work environment.
Network & Datacenter Deployment Engineer at Cloudflare focused on building and expanding their global network infrastructure with collaboration across multiple engineering teams and vendors.
Senior DevOps Engineer leading cloud - native solutions at Sparksoft Corporation. Driving automation and system reliability within a fast - paced Agile team.
Platform Engineer focusing on supporting CI/CD pipelines and Kubernetes at PCCW. Responsible for ensuring platform services' reliability and performance, with night - time support as needed.
Site Reliability Engineer at Bumble optimizing large - scale Linux environments and ensuring system stability. Focusing on troubleshooting, incident recovery, and performance tuning in complex infrastructures.
Senior DevOps Manager overseeing CI/CD processes for NVIDIA Networking products. Leading a team and collaborating with global teams to enhance R&D efficiency and infrastructure.