Join a tech company to manage Linux server infrastructure as a Sysadmin/SRE. Focus on automation, reliability, and technical support within a hybrid work setting.
Responsibilities
Configure, maintain, and scale Kubernetes clusters in on-premises environments, ensuring high availability and performance.
Manage and optimize the infrastructure of physical and virtual servers, emphasizing automation and environment reliability.
Automate repetitive provisioning, configuration, and monitoring tasks for servers and applications using tools such as Ansible, Terraform, Puppet, etc.
Implement and manage API Gateway solutions to control traffic and optimize communication between microservices and systems.
Create and maintain monitoring and alerting systems to provide real-time visibility into the health of infrastructure and services using tools such as Dynatrace, Datadog, Prometheus, Grafana, ELK Stack, or similar.
Provide real-time support for infrastructure issues and collaborate with development teams to diagnose and resolve incidents.
Propose and implement infrastructure improvements focused on automation, security, performance, and reduction of operational costs.
Create and maintain detailed technical documentation of procedures, processes, and configurations.
Requirements
Strong expertise in Kubernetes, including installation, configuration, maintenance, and cluster scalability. (Certified Kubernetes Administrator - CKA)
Expertise in Linux systems administration in on-premises environments, including installation, configuration, and maintenance of physical and virtual servers. (Red Hat Certified Engineer - RHCE)
Experience with infrastructure automation using tools such as Ansible, Terraform, Puppet, or similar.
Strong knowledge of API Gateways (e.g., Kong, Apigee), with experience configuring and managing API traffic in on-premises environments.
Senior DevOps Engineer responsible for leading CI/CD pipeline design and optimization. Collaborating with teams to drive DevOps maturity across the enterprise while managing infrastructure automation.
Cloud Operations Engineer ensuring reliable performance of cloud systems at 2Innovate. Focused on automation, incident management, cloud security, and infrastructure monitoring in cloud environments.
AWS DevOps Engineer responsible for delivering scalable digital experiences for EXL's MarTech ecosystem. Engaging in development, maintenance, and collaboration across stakeholders and services.
Senior Site Reliability Engineer managing critical infrastructure at Hornetsecurity. Collaborating with product teams to ensure performance and reliability across services.
Site Reliability Engineer enhancing platform reliability for AI workflows at WRITER. Overseeing automated solutions and cloud infrastructure supporting high - trafficked AI systems.
Site reliability engineer ensuring 24/7 availability of AI - powered workflows at WRITER. Developing and automating robust platforms for high - traffic AI demands.
Site Reliability Engineer maintaining cloud infrastructure for Tricentis SaaS Products. Collaborating closely with engineers, focusing on observability and performance.