DevOps Engineer managing complex incidents and automations in L3 support for Everseen. Driving best practices and collaborating across teams in cutting-edge AI solutions.
Responsibilities
You will be part of the L3 support team for Operations across Edge/on‑prem and cloud, owning complex incidents end‑to‑end: triage, deep‑dive debugging, root‑cause analysis, remediation, and follow‑ups.
To reduce Ops toil, you will build targeted automations (Python, Bash, Ansible) and automate new and existing SOPs used by Operations.
You will execute safe deployments and upgrades via GitOps and IaC pipelines (Flux, Ansible, Terraform) on AKS and GKE—coordinating validation and rollback plans—and contribute to the maintenance of existing GitLab CI/CD pipelines together with the DevOps engineering teams.
You will design and continuously refine Alertmanager rules and standardize actionable Grafana dashboards with Operations, ensuring effective use of Prometheus metrics and logs (Grafana Alloy, Thanos).
Beyond day‑to‑day operations, you’ll apply deep DevOps, CI/CD, and infrastructure automation expertise, drive best practices, share knowledge through workshops and mentoring, write and maintain documentation and SOPs (Standard Operating Procedure), test infrastructure, and collaborate across teams to optimize systems and workflows.
Requirements
4+ years in DevOps-related roles with a strong focus on automation.
Proficient in DNS, routing, container communication, firewalls, reverse-proxying, load-balancing, edge to cloud communication and troubleshooting.
Strong system administration skills are required for deploying and troubleshooting OS level outages and Everseen’s containerized Edge application in customer network.
Extensive experience with Azure (or GCP), including fully automated infrastructure and deployment.
Experience with monitoring and optimizing cloud costs.
Proven experience in implementing and managing CI/CD pipelines (GitLab CI/CD preferred) and excellent knowledge of Git and associated workflows (e.g., Gitflow).
Proven experience with monitoring, logging, and alerting tools and stacks.
Excellent scripting skills in Bash and Python.
Advanced knowledge of Kubernetes and Openshift, including cluster management, orchestration and auto-scaling, deployments using Helm charts and GitOps.
Proven experience with microservices architecture and related deployment strategies.
Expertise with Terraform modules.
Deep experience with Ansible, including writing complex playbooks, roles, and using Ansible Vault for secrets management.
Strong understanding of DevSecOps principles and experience implementing security best practices within CI/CD pipelines.
Excellent presentation, oral, and written communication skills. Fluent business English is a requirement.
A passionate advocate for determining and delivering solutions with a high level of customer satisfaction.
Demonstrated interest in learning and a strong desire to expand knowledge in their respective field.
Capable of engaging in technical discussions with stakeholders and leading DevOps projects. Mentors and coaches team members.
Benefits
Everseen is committed to creating a safe environment for all employees and has a zero tolerance policy for bias and discrimination of any kind.
Our work environment is one without offensive, hostile, or intimidating conduct, whether verbal, written or physical, in nature.
Everseen will not tolerate prejudice or discrimination of any kind including without limitation, where based on aspects such as, race, colour, sex, gender, religion, age, family status, disability of any kind, sexual orientation.
Mechanical/Reliability Engineer responsible for mechanical installations in Bergen op Zoom. Analyzing maintenance strategies and leading projects to enhance reliability.
Senior DevOps Engineer responsible for cloud infrastructure and deployments. Optimizing AWS services and ensuring system security and reliability for Verizon.
Senior DevOps Engineer responsible for automating infrastructure and building CI/CD pipelines for collaborative robotics company. Collaborating with global engineering teams from the Bangalore office.
Site Reliability Engineer Intern at Tencent working on gaming services and cloud native solutions. Collaborating with global teams to eliminate toil and enhance reliability.
Cloud/DevOps Specialist at N5X managing and optimizing critical cloud infrastructures for Brazilian energy trading. Collaborating with a multidisciplinary team to ensure high availability and performance.
Cloud/Devops Specialist responsible for designing a hybrid architecture combining cloud and on - premises infrastructure for energy trading systems. Collaborating with a multidisciplinary team in a dynamic environment.
Reliability Engineering Specialist utilizing reliability tools and models to improve asset performance at Enbridge. Collaborating across teams to guide investment decisions for safe operations.
DevOps Engineer responsible for structuring and supporting cloud DevOps architecture in Brazil. Working strategically on automation and CI/CD practices with development teams in Pernambuco.
DevSecOps Software Engineer developing secure CI/CD pipelines for Boeing's military software systems. Collaborate with cross - functional teams and implement automation and security best practices.
DevOps Manager responsible for managing a team for multi - cloud solutions supporting the USAF Cloud One project. Focus on scalable cloud - native solutions and CI/CD practices.