Senior Site Reliability Engineer leveraging modern Kubernetes and cloud-native technologies for high reliability and scalability. Solving platform challenges while contributing to improved managed services.
Responsibilities
Design and implement observability solutions using Prometheus, Loki and Mimir, including defining meaningful alerts
Analyze, troubleshoot and further develop custom Kubernetes controllers to ensure reliability and stability
Develop and maintain production applications with a focus on code quality, scalability and operational readiness
Operate, automate and continuously evolve the MKA platform with a focus on efficiency and maintainability
Enhance internal tooling to drive automation and reduce manual effort
Requirements
Experience operating highly available, business-critical applications in cloud and on-premises environments, including incident leadership
Strong Kubernetes knowledge and experience in cluster management
Experience with GitOps principles and ArgoCD for deployment and delivery workflows
Experience with Infrastructure as Code, particularly Terraform and Ansible
Proficient in Bash and/or Python for automation and tooling
Understanding of CI/CD pipelines, ideally with Tekton-based workflows
Very good German skills and good English skills (B2+) for technical collaboration
Nice to have: experience programming in Go
Experience with Nix for development tooling and automation
Experience with Helm, Make and Git
Additional experience with cloud-native platforms, observability or platform automation
Benefits
Deep hands-on Kubernetes experience
Freedom to solve challenges
Opportunities to share knowledge and continuously learn
Collaborative team environment
Internal show-and-tell sessions
Attendance at conferences such as KubeCon or Container Days
Job title
Senior Site Reliability Engineer – Kubernetes Platform
Analista Devops Pleno at Finnet managing cloud and infrastructure projects for client solutions. Involves architecture design, systems management, and team collaboration.
DevOps - Cloud Infrastructure Specialist designing, building and maintaining Azure and AWS infrastructure for Morgan Stanley. Requires strong expertise in cloud technologies and hands - on experience with Terraform and Kubernetes.
Lead DevOps Engineer at Incogni evolving infrastructure during monolith - to - microservices transitions. Building self - service platforms and ensuring observability in a fast - growing consumer privacy - tech product.
Senior Site Reliability Engineer maintaining reliability and user experience of AI services for Woven by Toyota. Collaborating with engineering teams to ensure service availability and performance.
GitHub Enterprise Specialist managing KONE's GitHub ecosystem, ensuring secure and scalable workflows. Collaborating with teams to enhance developer productivity through AI - powered capabilities.
DevOps Specialist supporting the engineering and operational enablement of next - gen data center platforms at KONE. Involves Infrastructure - as - Code deployments and daily DevOps workflows.
Senior Software Engineer responsible for designing microservices and enhancing LLM performance for Fortanix's Generative AI platform. Collaborating with data science and ML Infrastructure teams for security and optimization.
Reliability Engineering Technician conducting various verification tests and collaborating with reliability engineers. Preparing technical documentation in a well - equipped laboratory environment in Poland.
Reliability Engineer ensuring quality and reliability of products. Conducting various verification tests in a well - equipped laboratory in Mierzyn, Poland.
Senior SRE driving incident management and operational excellence in financial software solutions. Working with innovation and technology in Brazil's leading software company's team.