DevOps SRE focusing on reliability and performance of critical systems in a hybrid setup. Collaborating with development and infrastructure teams to enhance application observability and efficiency.
Responsibilities
Ensure the reliability, availability and performance of applications in production;
Define, monitor and evolve SLAs, SLOs and SLIs;
Implement and maintain observability practices (metrics, logs, tracing and alerts);
Develop automations to reduce toil and increase operational efficiency;
Lead incident management, perform root cause analysis and produce post‑mortems;
Collaborate with development, DevOps and infrastructure teams;
Contribute to security, resilience and compliance improvements;
Support FinOps initiatives to optimize cloud costs;
Promote SRE and DevOps best practices across squads.
Requirements
Experience with on‑premises and cloud environments (preferably AWS);
Strong knowledge of observability (Prometheus, Grafana, Dynatrace, Datadog, OpenTelemetry);
Experience with automation and scripting (Python, Go, Bash and/or PowerShell);
Knowledge of Linux and Windows;
Experience with Docker and Kubernetes;
Experience with SRE practices (error budgets, toil reduction, post‑mortems);
Experience with monitoring, alerting and dashboards;
Knowledge of networking, security and advanced troubleshooting;
Bachelor’s degree in Computer Science, Engineering or related fields;
Desirable: AWS, Observability or Kubernetes certifications;
Experience with CI/CD (GitLab, GitHub Actions, Jenkins);
Experience with IaC (Terraform, CloudFormation);
Knowledge of distributed architectures and microservices;
Experience with FinOps;
Familiarity with advanced SRE practices (Chaos Engineering, fault injection).
Benefits
Multi‑benefit card — choose how and where to use it.
Study grants for undergraduate, graduate, MBA and language courses.
Certification incentive programs.
Flexible working hours.
Competitive salaries.
Annual performance reviews with a structured career plan.
Sr. Site Reliability Engineer designing and automating robust technical infrastructure at Broadridge. Collaborating across teams for successful deployment and operational support of services.
Senior Fleet Reliability Engineer maintaining high fleet uptime for autonomous vehicle technology. Collaborating with technical teams to ensure peak operational performance in data collection efforts.
DevOps Lead at Leidos managing platform engineering, SRE, and application security functions. Driving operational excellence and ensuring scalability for federal government applications.
SRE Lead developing scalable cloud - native solutions for mission - critical systems supporting USAF. Managing teams, collaborating with cross - functional units, and ensuring high service reliability standards.
Junior DevOps / Platform Engineer at DieEnergiekoppler GmbH managing AWS/EKS platform operations. Collaborating with team members to improve platform functionalities and security compliance.
DevOps Engineer responsible for AWS infrastructures and backend development at Allguth GmbH. Engaging in greenfield projects with modern solutions in a collaborative team.
Cloud DevOps Specialist responsible for building scalable infrastructure solutions in AWS at SONDA. Focusing on automation, containerization, and data management in a collaborative environment.
DevOps Engineer maintaining and evolving deployment pipelines for Docebo’s AI - powered learning platform. Collaborating with cross - functional teams to ensure efficient software releases and infrastructure management.
DevOps Engineer optimizing CI/CD pipelines for Docebo, an AI - powered learning platform. Involves managing multi - tenant infrastructure using AWS, Docker, and Kubernetes.
DevOps Engineer maintaining and automating infrastructure and CI/CD processes for cybersecurity solutions by NordLayer. Collaborating with teams to ensure performance and scalability of cloud services.