Site Reliability Engineer handling the design, deployment, and operation of customer-facing SaaS platforms. Collaborating with various teams to ensure high availability and performance in the cloud environment.
Responsibilities
Design, deploy, and operate SaaS platforms on AWS.
Work with Kubernetes, Terraform, Crossplane, and GitOps practices to automate infrastructure.
Develop and maintain ArgoCD pipelines and reusable automation assets.
Manage monitoring and observability using tools like Prometheus, Grafana, Loki, OpenTelemetry, and Datadog.
Investigate and resolve system, application, and network issues.
Ensure platforms adhere to security and compliance standards.
Requirements
3–7+ years in SRE, DevOps, CloudOps, or cloud engineering roles.
Strong background working with AWS services and SaaS architectures.
Experience managing reliability metrics and applying SRE principles in production environments.
Proficiency with AWS (networking, compute, storage, IAM, multi-account environments).
Strong understanding of containers and Kubernetes (EKS preferred).
Experience with Terraform, Git, CI/CD, ArgoCD, and Infrastructure-as-Code practices.
Scripting skills (Python, Bash/PowerShell, YAML) and experience with tools like Crossplane or Ansible.
Solid experience with observability stacks (Grafana, Prometheus, Loki, Datadog, OpenTelemetry).
Good knowledge of system design, troubleshooting, and performance analysis.
Clear communicator with strong organizational skills.
Werkstudent supporting Revenue Operations at Damovo, enhancing processes between Marketing and Sales through analysis and collaboration. Involvement in Salesforce, Marketing Automation and data - driven decision making.
Werkstudent in Controlling & Operations at Plan.Net Group in Munich. Supporting project management, reporting, and collaboration with various units for data interpretation and analysis.
Service Operations Manager leading service operations and team management at Hitachi Energy. Aligning team strategies with business objectives while promoting customer satisfaction and operational excellence.
Director of Operations overseeing manufacturing and repair operations at Mitsubishi Power. Leading strategy planning and implementation for turbine and component management.
Operations Manager leading plant operations at Mitsubishi Power to meet customer demand. Collaborating with maintenance, engineering, human resources, accounting, and quality assurance for continuous improvement.
Manager for Operations Improvement within Walmart overseeing multiple projects and leading a team in a dynamic retail environment. Role focuses on project management and operational improvement initiatives.
Operations Officer managing government initiatives and providing executive support. Driving mission success through effective communication and tactical planning with key stakeholders.
Supply Chain Operations Specialist leading logistics service delivery at CEVA Logistics. Engaging with clients, ensuring operations excellence, and training staff for superior service fulfillment.
Field Operations Technician responsible for deploying and maintaining IT assets for healthcare. Collaborating with a team and providing technical support across various facilities nationwide.