Senior DevOps Engineer collaborating with clients on Kubernetes deployment strategies in AI-focused environments. Managing the deployment lifecycle across AWS, Azure, and on-premise systems.
Responsibilities
Partner directly with clients to assess infrastructure requirements, security constraints, and deployment preferences.
Design and implement deployment strategies for Kubernetes clusters across AWS, Azure, and on-premise environments.
Serve as the primary technical point of contact throughout the deployment lifecycle.
Troubleshoot complex deployment issues, distinguishing between infrastructure and application-level concerns.
Act as a system validator to ensure our solutions function seamlessly within client environments.
Gather client feedback to inform internal development priorities.
Continuously apply and improve deployment best practices and coach peers in their adoption.
Requirements
Proven experience deploying and managing **Kubernetes** clusters (AWS, Azure, on-premise).
Deep expertise with Azure and **Infrastructure-as-Code tools such as Crossplane, Terraform, and Helm.**
Strong understanding of networking, security, and observability in containerized environments.
Proven success in** client-facing** roles with excellent communication and stakeholder management skills.
Ability to explain complex technical concepts to both technical and non-technical audiences.
Strong problem-solving skills and a customer-first mindset.
Self-motivated, organized, and effective at managing multiple priorities.
Bonus: Experience in customer success, solutions engineering, or regulated/high-security environments.
Benefits
Hybrid working policy (8 days per month in the Skopje office)
Lead Power Platform Reliability Engineer at Manulife enhancing applications and services. Collaborating with stakeholders and mentoring team members in low - code solutions and Power Platform technologies.
SRE role at BT Group focusing on cloud reliability and operational excellence across engineering teams. Collaborate with product owners to implement SRE principles for improved service performance.
Senior Site Reliability Engineer at Uniphore developing cloud infrastructure and Go services. Collaborating with teams to ensure operational excellence and reliability.
As Learning Content Engineer, developing and enhancing training content for Cloud and DevOps. Engaging in creating practical learning materials from basics to advanced topics.
AWS DevOps Microservices Engineer at Solventum designing secure and scalable AWS infrastructures. Collaborating with diverse teams for innovative healthcare solutions using cloud technology.
DevOps Engineer building and maintaining Catena’s scalable platform infrastructure. Collaborating with engineers to enhance CI/CD pipelines and support cloud - native workloads on AWS.
SRE Observability SLO Engineer for GE Vernova’s GridOS Platform Engineering team. Building telemetry stack in SaaS reliability for critical energy infrastructure.