Senior Site Reliability Engineer at Getrak responsible for platform reliability and monitoring in critical environments. Collaborating with engineering teams on performance and observability across a major SaaS platform.
Responsibilities
Define, implement, and monitor SLIs/SLOs for availability, latency, and reliability.
Design and optimize CI/CD pipelines for microservices in high-availability environments.
Manage and evolve infrastructure on AWS (EC2, ECS/EKS, S3, RDS, CloudFront, VPC, IAM, CloudWatch, etc.).
Manage distributed databases and critical systems: Astra DB / Cassandra (DataStax), Redis, and RabbitMQ.
Automate provisioning, configuration, and scalability with Terraform, Ansible, or similar tools.
Develop and maintain observability practices (metrics, logs, tracing) using DataDog and related tools.
Lead investigations into critical incidents, proposing definitive solutions (blameless postmortems).
Work on cloud cost optimization, balancing reliability and budget.
Ensure infrastructure security and compliance, with access policies, backups, and continuous auditing.
Collaborate with engineering and product teams, bringing a reliability mindset to the development cycle.
Requirements
6+ years of SRE/DevOps experience in high-scale, mission-critical environments.
Strong expertise in AWS and cloud-native architecture.
Advanced knowledge of Cassandra (Astra DB / DataStax), Redis, and RabbitMQ.
Experience with microservices and containerization (Docker, Kubernetes, ECS/EKS).
Site Reliability Engineer responsible for infrastructure supporting AI platform. Safeguarding US customer data and ensuring compliance in the Aerospace and Defense sector.
Senior Infrastructure Engineer managing Azure platform for a SaaS product at Rillion. Focused on automation, security, reliability, and scalability in a hybrid work environment.
Statistician/Reliability Engineer applying statistical analysis for satellite systems at Aerospace Corporation. Leading projects on system reliability and working closely with interdisciplinary teams in a full - time on - site role.
DevOps Engineer designing and implementing solutions to optimize operations in media technology at Mediagenix. Collaborating with cross - functional teams to enhance user experiences.
Senior DevOps Engineer at SimCorp managing cloud environments and automating builds using Azure. Collaborating with cross - functional teams to ensure high service availability and compliance.
DevOps Senior Software Engineer at SimCorp developing high - quality software solutions for financial technology. Responsible for mentoring junior engineers and solving complex technical challenges.
DevOps Engineer designing, building, and operating software development infrastructure for CodeMettle. Leading automation and best practices to enhance value delivery across teams.
DevOps Engineer maintaining scalable infrastructure for VOX's telecom services. Implementing automation and CI/CD pipelines in a fast - paced environment with significant growth potential.
DevOps Engineer focused on designing and managing CI/CD pipelines using Azure DevOps. Collaborating with teams for application deployment and ensuring DevSecOps practices.
DevOps Engineer working closely with engineering and security teams to optimize CI/CD pipelines and manage infrastructure. Ensuring security and compliance for mission - critical financial applications.