DevOps SRE focusing on reliability and performance of critical systems in a hybrid setup. Collaborating with development and infrastructure teams to enhance application observability and efficiency.
Responsibilities
Ensure the reliability, availability and performance of applications in production;
Define, monitor and evolve SLAs, SLOs and SLIs;
Implement and maintain observability practices (metrics, logs, tracing and alerts);
Develop automations to reduce toil and increase operational efficiency;
Lead incident management, perform root cause analysis and produce post‑mortems;
Collaborate with development, DevOps and infrastructure teams;
Contribute to security, resilience and compliance improvements;
Support FinOps initiatives to optimize cloud costs;
Promote SRE and DevOps best practices across squads.
Requirements
Experience with on‑premises and cloud environments (preferably AWS);
Strong knowledge of observability (Prometheus, Grafana, Dynatrace, Datadog, OpenTelemetry);
Experience with automation and scripting (Python, Go, Bash and/or PowerShell);
Knowledge of Linux and Windows;
Experience with Docker and Kubernetes;
Experience with SRE practices (error budgets, toil reduction, post‑mortems);
Experience with monitoring, alerting and dashboards;
Knowledge of networking, security and advanced troubleshooting;
Bachelor’s degree in Computer Science, Engineering or related fields;
Desirable: AWS, Observability or Kubernetes certifications;
Experience with CI/CD (GitLab, GitHub Actions, Jenkins);
Experience with IaC (Terraform, CloudFormation);
Knowledge of distributed architectures and microservices;
Experience with FinOps;
Familiarity with advanced SRE practices (Chaos Engineering, fault injection).
Benefits
Multi‑benefit card — choose how and where to use it.
Study grants for undergraduate, graduate, MBA and language courses.
Certification incentive programs.
Flexible working hours.
Competitive salaries.
Annual performance reviews with a structured career plan.
Client Services Consultant specializing in DevOps Mainframe Operations with experience in automation best practices. Analyzing Life Cycle Management data needs and evaluating solutions for Endevor - related operations.
Senior AWS DevOps Engineer at LexisNexis shaping global CI/CD platform. Collaborating with teams to deliver secure, reliable, and scalable delivery pipelines.
Cloud Engineer at MetroStar focusing on building and securing cloud - native systems. Managing Kubernetes workloads and CI/CD pipelines in Agile teams with an emphasis on security.
Senior Engineer Cloud Engineering role focused on AWS migration and automation. Collaborating with teams to innovate cloud patterns and infrastructure best practices.
Senior Operations Engineer driving efficiency and reliability in NVIDIA's global business operations. Collaborating with IT subsystems and automating operational workflows for organizational impact.
Lead or Senior DevOps Developer joining Boeing Defense, Space and Security for advanced technology missions. Involves CI/CD, cloud systems design, and collaboration with government customers.
Site Reliability Engineer ensuring high availability and performance for digital platforms in retail. Collaborating with engineering teams for automation and observability practices.
Associate Site Reliability Engineer supporting the reliability and performance of global IT infrastructure at Exegy. Engage with senior engineers and learn foundational systems engineering skills.
Site Reliability Engineer driving innovation and growth for Banking Solutions, Payments, and Capital Markets business. Responsible for application reliability and incident response in a hybrid work environment.
DevSecOps role at Tiime ensuring implementation of security practices in products. Collaborate with teams for cloud security and incident management in a hybrid workspace.