Site Reliability Engineer maintaining cloud infrastructure for Tricentis SaaS Products. Collaborating closely with engineers, focusing on observability and performance.
Responsibilities
Design, build, and maintain the product cloud infrastructure that enables seamless scaling to support hundreds of thousands of concurrent users
Develop advanced monitoring systems that proactively alert on symptoms, ensuring rapid response to potential issues
Leverage tools like Terraform, GitHub actions, and Kubernetes to efficiently manage our AWS or AZURE infrastructure
Continuously enhance operational processes, making deployments, upgrades, and other tasks as boring and automated as possible
Collaborate with product engineers on daily basis and influence product architectures designs
Be part of an on-call (PagerDuty) rotation to respond swiftly to incidents affecting availability, offering support to product engineers during customer incidents
Requirements
Proficiency in Terraform syntax and GitHub Actions configuration, including pipelines and job management using GitOps
Working knowledge of SaaS architecture concepts and designs
Understanding of Kubernetes, including CLI usage and service re-provisioning
Ability to provision and set up metrics along with managing alerts and silences
Identify Service Level Indicators (SLIs) that align the team with availability and latency objectives
Experience with Linux operating system configuration, package management, and troubleshooting
Working experience with cloud environments like AZURE or AWS and provisioning infrastructure there
Good cultural fit: clear communication, empathy, curiosity & continuous learning, no blame attitude, but instead supportive
Benefits
Flexible working schedule (no core hours)
Learning and career growth opportunities
25 days of paid time off
3 Sick Days
2 days of paid Volunteering Leave per year to get involved in your local community or in a cause that matters to you
Hybrid work environment, with home-office allowance
Meal allowance
Pension Contribution
Life & Disability Insurance
Paid Sickness Leave
A team of passionate professionals who are experts in their fields
Events for employees to learn, celebrate and socialise (training sessions, hackathons, parties, sports events, board game gatherings, BBQs) and much more
Network Infrastructure Engineer overseeing network architecture and infrastructure for AI data platform. Building solutions to enhance performance, security, and scalability.
Manager of Cloud Operations leading SRE practices to ensure reliability and scalability of cloud infrastructure on AWS and Azure. Join a growing team at Vendavo, enhancing customer success through efficient cloud operations.
DevOps Engineer developing and securing cloud - native container platforms at Booz Allen Hamilton. Supporting deployment strategies and managing resources for effective cloud solutions.
Full Stack DevOps Software Engineer responsible for developing cloud - native applications at 0NLU AG. Collaborating in a DevOps team to deliver software solutions with high automation and quality.
Senior DevOps Consultant in Frankfurt helping clients optimize cloud and data projects through innovative solutions. Collaborating in an agile environment with a focus on continuous learning and development.
Mid DevOps Engineer supporting engineering teams delivering payment and transaction platforms at Expleo. Focusing on CI/CD, automation, and operational control in international environments.
Senior DevOps Engineer supporting engineering teams in payment and transaction platforms. Improving CI/CD, deployment automation, platform reliability, and engineering efficiency in international environments.
Ingénieur Systèmes, DevOps et Sécurité couvrant les outils et l’infrastructure IT pour un groupe international. Collaborant avec le CTO pour l’évolution technique et la gestion des projets.
Staff System Reliability Engineer at Disney building high - quality production systems. Collaborating with engineers to design scalable, cloud - native services and ensuring optimal performance and reliability.