Design, deployment, and management of scalable and secure Kubernetes clusters on OVHcloud.
Ownership and advancement of our CI/CD pipelines for automated, reliable application and infrastructure deployments.
Implementation and management of our GitOps workflows using tools like ArgoCD or Flux.
Management and scaling of GPU workloads in Kubernetes, ensuring optimal performance and resource utilization for our ML teams.
Development and maintenance of our observability stack (VictoriaMetrics, VictoriaLogs, Grafana, Tracing) to ensure deep visibility into system health.
Management of our cloud infrastructure on OVHcloud, focusing on automation (Infrastructure as Code), cost optimization, and security.
Lifecycle management of core platform services, including message brokers (RabbitMQ), databases (PostgreSQL, Redis), and authentication systems (Okta, OIDC, OAuth2).
Acting as a key responder for infrastructure incidents; debugging and troubleshooting complex production issues across distributed systems.
Supporting and empowering development teams by providing robust self-service tools, clear documentation, and collaborative support.
Requirements
3-5+ years of professional experience in a Platform Engineering, DevOps, or SRE role
Deep, hands-on experience with Kubernetes in a production environment (cluster management, networking, security, scheduling)
Proven experience managing infrastructure on a cloud provider (OVHcloud is a strong plus; AWS, GCP, or Azure experience is also valued)
Strong practical knowledge of CI/CD systems (e.g. GitHub Actions) and GitOps principles (ArgoCD, Flux)
Proficiency with Infrastructure as Code (IaC) tools like Terraform or Pulumi
Solid understanding of observability principles and tools (e.g. VictoriaMetrics, VictoriaLogs, OpenTelemetry/Tracing, Grafana)
Experience managing stateful services in production (e.g. PostgreSQL, Redis, RabbitMQ)
Solid scripting skills in Python
Benefits
Full ownership of a mission-critical platform
A team that values curiosity, learning, and experimentation
Remote-first setup with the option to work in our Berlin office
Principal AWS Platform Engineer at Appvia guiding clients in cloud adoption and DevOps excellence. Leading teams and projects while fostering innovation in cloud technologies.
Platform Engineer building secure and reliable internal platforms for developers at Alto Software Group. Collaborating with cross - functional teams to enhance developer experience and productivity.
Vertica Database Administrator overseeing Vertica systems operations at MassMutual. Providing 24/7 support while ensuring data reliability and security across clustered environments.
Director of Platform Engineering leading the vision, design, and evolution of a developer platform for cloud and infrastructure services. Driving DevOps excellence and automation initiatives across divisions in a strategic role.
Security Engineer developing agent - based tooling and services for NVIDIA's secure software development lifecycle. Collaborating across teams to ensure compliance and security in software development practices.
Power Platform Developer at Macaw creating applications and automating processes with Microsoft technologies. Collaborating with teams to understand requirements and deliver functional solutions.
AI Platform Engineer building and operating secure, scalable components of a cloud AI platform at Elevance Health. Design, implement, and automate cloud services and APIs while improving performance and efficiency.
Platform Engineer focusing on Kubernetes for Bundesdruckerei in Berlin. Supporting a multi - tenant platform with over 80 applications, evaluating new technologies and ensuring automation with infrastructure as code.