Site Reliability Engineer focusing on AWS cloud services and Site Reliability Engineering practices. Collaborating on performance, availability, and observability within a hybrid work environment.
Responsibilities
Work on SRE initiatives and activities in an AWS cloud environment;
Define and monitor Service Levels (SLAs), Service Level Indicators (SLIs) and performance metrics;
Expand and consolidate Site Reliability Engineering (SRE) practices;
Assess service maturity and define optimization strategies and process adjustments;
Monitor technical and business metrics, ensuring availability, resilience and performance of IT services;
Participate in modernization and cloud migration projects;
Work on projects and design architectures focused on Observability.
Requirements
Experience with Observability and APM tools such as Grafana, AppDynamics, Dynatrace, Prometheus, DataDog, ELK and Zabbix;
Experience in log analysis and troubleshooting connectivity and integrations between applications and partners;
Experience optimizing cost and performance of cloud services on AWS;
Focused on reliability, availability and security of services.
Benefits
Multi-benefits card – you choose how and where to use it.
Scholarships for Undergraduate, Postgraduate, MBA and language courses.
Certification incentive programs.
Flexible working hours.
Competitive salaries.
Annual performance review with a structured career plan.
Principal Site Reliability Engineer at Early Warning designing performance and resiliency patterns for applications and infrastructure. Collaborating with development teams to improve systems and data integrity.
DevOps Engineer contributing to CI/CD setup and Azure services management. Collaborates with teams to ensure efficient project delivery in a hybrid environment.
IT DevOps Specialist at BMW responsible for analyzing requirements and implementing software solutions in AWS cloud environments. Collaborating internationally within agile teams for digital transformation projects.
DevOps Engineer at Vistra designing, implementing, and maintaining robust CI/CD pipelines and cloud infrastructure. Enabling software delivery across multiple technology stacks with a focus on AWS.
Manage complex customer rollouts and initial system deployments at Talex.ai. Bridging technical development with real - world application in robotics and AI systems.
Cloud Operations Engineer designing and implementing highly reliable cloud solutions. Leading cloud infrastructure initiatives for production operations and customer success in a growing team.
Quality Engineer supporting new product launches and reliability testing for SSD at Micron in Malaysia. Responsible for coordinating test activities and conducting failure analysis.
Reliability Engineer ensuring operational readiness of data centers at Rowan Digital Infrastructure. Overseeing commissioning, operational standards, and transitioning facilities into live operations.
Manager of Mechanical Engineering ensuring high - availability mechanical systems in data centers. Collaborating on lifecycle management and performance evaluation across missions - critical facilities in a hybrid role.
DevOps Engineer developing reusable Ansible and Puppet modules and managing CI/CD for project teams. Join PLATH in Hamburg, focusing on crisis detection software development.