DevOps & Kubernetes Engineer at AI software startup near Porto, managing Kubernetes clusters for ML workloads and collaborating on infrastructure solutions.
Responsibilities
Design, deploy, and manage production-grade Kubernetes clusters for ML and microservice workloads
Maintain and optimize container orchestration, including service mesh, network policies, and resource allocation
Oversee CI/CD pipelines using tools like GitHub Actions, GitLab CI, and Terraform
Manage Docker image lifecycle and enforce security best practices
Monitor infrastructure health using Prometheus, Grafana, and centralized logging solutions
Collaborate on Infrastructure as Code (IaC) and ensure scalable, reproducible deployments
Support GPU-based workloads and optimize GPU resource utilization for LLM agents
Maintain Linux-based cloud servers, implement security protocols, and manage DNS, VPNs, and firewalls
Troubleshoot Python microservices and contribute to automation and monitoring setups
Implement model serving and orchestration pipelines (MLflow, Kubeflow, etc.)
Ensure high availability and disaster recovery strategies across systems
Requirements
3-7+ years of advanced hands-on experience with Kubernetes administration, including networking, storage, and security
Proficient in Docker, multi-stage builds, and image lifecycle management
Strong Linux system administration skills (Ubuntu or RHEL-based systems)
Experience with cloud platforms, ideally Google Cloud Platform (GCP)
Solid understanding of CI/CD pipelines using tools like GitHub Actions, GitLab CI, or Jenkins
Familiarity with Infrastructure as Code (Terraform, Ansible) and GitOps workflows
Confident in managing GPU workloads and ML/LLM-serving infrastructure
Experience with monitoring and observability tools such as Prometheus, Grafana, ELK/EFK stack
Comfortable with Python microservices and ML workflow troubleshooting
Fluent English skills at C1 or above
Benefits
Competitive Salary: Commensurate with your experience and contributions.
Flexible Work Setup: On-site collaboration in Porto, with the option for full remote work based on strong performance after onboarding.
Relocation Support: For your on-site onboarding or if you decide to move to Porto, you receive support with logistics, housing and onboarding connections to make it smooth.
Training & Growth Budget: Set aside for conferences, courses, and certifications.
Daily Meal Subsidy: Enjoy lunch on the company when working from the office.
Team Events: From BBQs to game nights and a Christmas party, with the first drinks on the house.
Onboarding Buddy: You won’t be left alone—get paired with someone who helps you ramp up quickly.
Suspension Design and Release Engineer for Ford, impacting vehicle ride, handling, and NVH. Collaborating with cross - functional teams to deliver quality systems and components.
DevOps Engineer at TeamViewer driving DevOps excellence by building CI/CD pipelines and managing Kubernetes. Collaborate within a diverse team to optimize digital processes with cloud infrastructure.
Senior DevOps Engineer at Luminor, a leading bank in the Baltics, managing customer - facing platforms and infrastructure. Building CI/CD pipelines and mentoring junior engineers.
Building and maintaining DevOps processes and CI/CD pipelines for Luminor's banking champion. Collaborating in a flexible work environment with international teams.
Senior DevOps Engineer managing DevOps processes and tooling for customer - facing platforms at Luminor. Building CI/CD pipelines and providing production support with a focus on mentoring and collaboration.
Sr. Site Reliability Engineer designing and automating robust technical infrastructure at Broadridge. Collaborating across teams for successful deployment and operational support of services.
Senior Fleet Reliability Engineer maintaining high fleet uptime for autonomous vehicle technology. Collaborating with technical teams to ensure peak operational performance in data collection efforts.
DevOps Lead at Leidos managing platform engineering, SRE, and application security functions. Driving operational excellence and ensuring scalability for federal government applications.
SRE Lead developing scalable cloud - native solutions for mission - critical systems supporting USAF. Managing teams, collaborating with cross - functional units, and ensuring high service reliability standards.
Junior DevOps / Platform Engineer at DieEnergiekoppler GmbH managing AWS/EKS platform operations. Collaborating with team members to improve platform functionalities and security compliance.