Site Reliability Engineer automating infrastructure deployment for Scaleway's sovereign cloud products. Collaborating with product teams to enhance observability and reliability of the platform.
Responsibilities
Build and standardize the infrastructure that hosts Scaleway’s product catalog.
Manage onboarding of product teams onto the platform.
Implement and optimize observability stacks (monitoring, alerting).
Automate infrastructure using GitOps processes and Infrastructure as Code.
Deploy product stacks across new geographic regions.
Handle operational maintenance (MCO) and participate in a weekly on-call rotation (approximately 1 week per month, including weekends).
Improve CI/CD pipelines and maintain technical documentation.
Collaborate closely with product teams to bridge the gap between development and infrastructure.
Ensure security compliance, including secret management (e.g., Vault).
Drive continuous improvement of existing systems and deployment workflows.
Requirements
Senior-level Systems Administration experience (ideally 7+ years).
Strong expertise in Kubernetes (K8s) and GitOps workflows (ArgoCD, FluxCD).
Proficiency with observability tools (Grafana, Thanos, Prometheus).
Solid knowledge of networking and security principles.
Experience with Infrastructure as Code.
Comfortable with on-call rotations and SRE objectives (SLA/SLO concepts).
Benefits
Hybrid work: Up to 3 days of remote work per week.
Offices: Spacious, dynamic offices with bold design, conveniently located near public transport. Most offices include outdoor spaces (terraces) and bike parking.
Dining: A chef provides healthy meals at headquarters, and breakfast is available at all sites year-round. Employees working from regional sites receive a Swile card for lunches.
Well-being support: Access to gym facilities, daycare places, and discounts on caregiving services to help maintain work-life balance.
International environment: Dozens of nationalities; English is as widely spoken as French.
Career & mobility: Managers encourage internal mobility, with opportunities to move to other entities within the Iliad Group available to all employees.
DevOps Engineer responsible for managing Microsoft Intune operations at Bundesdruckerei GmbH. Focused on ensuring secure digital solutions for identity and data protection in Berlin.
Senior Site Reliability Engineer driving observability and reliability for business - critical systems at Incedo. Collaborating with engineering teams to enhance system resilience and performance.
DevSecOps Specialist securing the software development lifecycle at Vanguard. Collaborating with teams to improve application security tooling and processes, and provide development guidance.
Reliability Engineer responsible for equipment reliability and safety using data - driven analysis for Wood in Aberdeen. Focus on proactive maintenance and operational efficiency.
Principal Safety and Reliability Engineer developing and supporting safety design for mission - critical aerospace systems. Engaging in design reviews and ensuring compliance with requirements.
Cloud DevOps Engineer playing a pivotal role in developing migration plans for Coast Guard Cloud Architecture. Collaborating with teams to ensure effectiveness and best practices in cloud implementation.
Reliability Engineer III at Daimler Truck developing propulsion technology solutions for electrified and conventional axle components. Leading testing and validation for complex powertrain systems.
Electrical Reliability Engineer at Marathon Petroleum maintaining electrical equipment and systems. Collaborating with cross - functional teams and ensuring compliance with electrical codes and standards.
Senior DevOps Engineer focused on GCP platform engineering at healthtech startup. Collaborating with teams to enhance compute and networking capabilities.