Site Reliability Engineer maintaining cloud infrastructure for Tricentis SaaS Products. Collaborating closely with engineers, focusing on observability and performance.
Responsibilities
Design, build, and maintain the product cloud infrastructure that enables seamless scaling to support hundreds of thousands of concurrent users
Develop advanced monitoring systems that proactively alert on symptoms, ensuring rapid response to potential issues
Leverage tools like Terraform, GitHub actions, and Kubernetes to efficiently manage our AWS or AZURE infrastructure
Continuously enhance operational processes, making deployments, upgrades, and other tasks as boring and automated as possible
Collaborate with product engineers on daily basis and influence product architectures designs
Be part of an on-call (PagerDuty) rotation to respond swiftly to incidents affecting availability, offering support to product engineers during customer incidents
Requirements
Proficiency in Terraform syntax and GitHub Actions configuration, including pipelines and job management using GitOps
Working knowledge of SaaS architecture concepts and designs
Understanding of Kubernetes, including CLI usage and service re-provisioning
Ability to provision and set up metrics along with managing alerts and silences
Identify Service Level Indicators (SLIs) that align the team with availability and latency objectives
Experience with Linux operating system configuration, package management, and troubleshooting
Working experience with cloud environments like AZURE or AWS and provisioning infrastructure there
Good cultural fit: clear communication, empathy, curiosity & continuous learning, no blame attitude, but instead supportive
Benefits
Flexible working schedule (no core hours)
Learning and career growth opportunities
25 days of paid time off
3 Sick Days
2 days of paid Volunteering Leave per year to get involved in your local community or in a cause that matters to you
Hybrid work environment, with home-office allowance
Meal allowance
Pension Contribution
Life & Disability Insurance
Paid Sickness Leave
A team of passionate professionals who are experts in their fields
Events for employees to learn, celebrate and socialise (training sessions, hackathons, parties, sports events, board game gatherings, BBQs) and much more
Site Reliability Engineer enhancing platform reliability for AI workflows at WRITER. Overseeing automated solutions and cloud infrastructure supporting high - trafficked AI systems.
Site reliability engineer ensuring 24/7 availability of AI - powered workflows at WRITER. Developing and automating robust platforms for high - traffic AI demands.
Dev Ops Engineer at DATAGROUP in Rostock managing IT applications and cloud technologies. Collaborating with teams to support client IT transformations in a flexible work environment.
SRE Technical Manager leading reliability engineering teams ensuring performance for Navy IT services. Manage teams, collaborate on automation, and drive continuous improvement in a critical systems environment.
DevOps Engineer responsible for optimizing and securing cloud deployment processes at Axi. Collaborating across technology teams to promote best practices in DevOps methodologies.
Azure Cloud Engineer ensuring safe and scalable cloud environment at Schoologica while contributing to innovative educational solutions with modern cloud technologies.
DevSecOps Engineer responsible for enhancing Thales' secure hosting platforms in public and private clouds. Collaborating with teams to apply modern practices and build resilient infrastructures.
DevOps Engineer specializing in AWS Cloud Infrastructure in a hybrid position. Collaborating within a supportive team to build modern infrastructure for VM - based applications.