Join a tech company to manage Linux server infrastructure as a Sysadmin/SRE. Focus on automation, reliability, and technical support within a hybrid work setting.
Responsibilities
Configure, maintain, and scale Kubernetes clusters in on-premises environments, ensuring high availability and performance.
Manage and optimize the infrastructure of physical and virtual servers, emphasizing automation and environment reliability.
Automate repetitive provisioning, configuration, and monitoring tasks for servers and applications using tools such as Ansible, Terraform, Puppet, etc.
Implement and manage API Gateway solutions to control traffic and optimize communication between microservices and systems.
Create and maintain monitoring and alerting systems to provide real-time visibility into the health of infrastructure and services using tools such as Dynatrace, Datadog, Prometheus, Grafana, ELK Stack, or similar.
Provide real-time support for infrastructure issues and collaborate with development teams to diagnose and resolve incidents.
Propose and implement infrastructure improvements focused on automation, security, performance, and reduction of operational costs.
Create and maintain detailed technical documentation of procedures, processes, and configurations.
Requirements
Strong expertise in Kubernetes, including installation, configuration, maintenance, and cluster scalability. (Certified Kubernetes Administrator - CKA)
Expertise in Linux systems administration in on-premises environments, including installation, configuration, and maintenance of physical and virtual servers. (Red Hat Certified Engineer - RHCE)
Experience with infrastructure automation using tools such as Ansible, Terraform, Puppet, or similar.
Strong knowledge of API Gateways (e.g., Kong, Apigee), with experience configuring and managing API traffic in on-premises environments.
Sr. Site Reliability Engineer designing and automating robust technical infrastructure at Broadridge. Collaborating across teams for successful deployment and operational support of services.
Senior Fleet Reliability Engineer maintaining high fleet uptime for autonomous vehicle technology. Collaborating with technical teams to ensure peak operational performance in data collection efforts.
DevOps Lead at Leidos managing platform engineering, SRE, and application security functions. Driving operational excellence and ensuring scalability for federal government applications.
SRE Lead developing scalable cloud - native solutions for mission - critical systems supporting USAF. Managing teams, collaborating with cross - functional units, and ensuring high service reliability standards.
Junior DevOps / Platform Engineer at DieEnergiekoppler GmbH managing AWS/EKS platform operations. Collaborating with team members to improve platform functionalities and security compliance.
DevOps Engineer responsible for AWS infrastructures and backend development at Allguth GmbH. Engaging in greenfield projects with modern solutions in a collaborative team.
Cloud DevOps Specialist responsible for building scalable infrastructure solutions in AWS at SONDA. Focusing on automation, containerization, and data management in a collaborative environment.
DevOps Engineer maintaining and evolving deployment pipelines for Docebo’s AI - powered learning platform. Collaborating with cross - functional teams to ensure efficient software releases and infrastructure management.
DevOps Engineer optimizing CI/CD pipelines for Docebo, an AI - powered learning platform. Involves managing multi - tenant infrastructure using AWS, Docker, and Kubernetes.
DevOps Engineer maintaining and automating infrastructure and CI/CD processes for cybersecurity solutions by NordLayer. Collaborating with teams to ensure performance and scalability of cloud services.