Principal DevOps Engineer at Sleek collaborating with AI, Product, and Engineering teams. Designing secure infrastructures and leading DevOps practices for advanced cloud architectures.
Responsibilities
As one of Sleek’s most senior technical contributors, you will partner closely with Product, Engineering, and AI teams to define our infrastructure strategy, design resilient cloud architectures, and ensure our platforms remain secure, scalable, and high-performing. You will play a central role in integrating AI systems into production environments, enabling efficient delivery, observability, and reliability across Sleek’s products and internal operations.
High-quality, secure, and scalable infrastructure capable of supporting modern applications and advanced AI workloads
Robust automation across CI/CD, infrastructure provisioning, and operations to increase reliability and reduce manual overhead
Thoughtful and pragmatic integration of AI into operational workflows to improve efficiency, detect anomalies, and accelerate delivery
Reliable systems engineering practices, including monitoring, incident response, performance tuning, and capacity planning
Strong DevOps standards, including reproducibility, testing, documentation, and operational excellence
Clear technical communication and cross-team alignment to enable predictable delivery and collaborative problem-solving
Mentorship and technical leadership that elevates platform engineering, DevOps maturity, and overall engineering quality across the organisation
Conduct a full review of Sleek’s cloud infrastructure and propose a roadmap for reliability and scalability improvements
Lead upgrades or redesigns of core platform components such as networking, containers, orchestration, or databases
Improve incident response processes, SLIs, SLOs, and on-call readiness.
Ensure platform and infrastructure are capable of supporting AI-powered features
Build or refine pipelines for model hosting, embeddings, vector search, or related AI services if required
Implement monitoring and guardrails for AI service performance, cost, and stability
Enhance CI/CD pipelines for speed, safety, and reliability
Introduce infrastructure automation, testing automation, and deployment tooling to reduce manual steps
Champion modern DevOps and AI-assisted tooling to improve engineering productivity.
Strengthen logging, monitoring, tracing, and alerting across services
Reduce noisy alerts and improve the signal-to-noise ratio for incidents
Implement readiness checks, runbooks, and automated recovery paths for critical services
Ensure secure configuration, secrets management, access control, and identity management
Implement automated security scanning, dependency monitoring, and hardened pipeline practices
Prepare platform-level requirements needed for reliable and secure AI usage
Coach engineers on cloud best practices, reliability, and operational readiness
Lead architecture reviews, workshops, and platform knowledge-sharing session
Build tooling, templates, and patterns that elevate team productivity
Requirements
Experience & Discipline: 8+ years of progressive experience in DevOps, Site Reliability Engineering (SRE), Platform Engineering, or Infrastructure Engineering.
Cloud Expertise: Strong, hands-on experience across multi-cloud environments (AWS, GCP, Azure), including expertise in networking, compute, storage, security, and cost optimization.
Core Platform Stack: Deep expertise in containerization and orchestration (e.g., Kubernetes, EKS, ECS), and extensive experience with Infrastructure as Code (IaC) (e.g., Terraform, Pulumi, CloudFormation).
AI/ML Infrastructure: Experience supporting or deploying AI/ML workloads (e.g., model inference, vector databases, GPU workloads), or strong familiarity with the infrastructure requirements for these systems.
System Reliability: Proven ability to design, build, and operate highly reliable, scalable production systems utilizing advanced Zero-Downtime Deployment Patterns (e.g., Blue/Green, Canary, progressive delivery).
Modern Delivery & Tooling: Expertise in modernizing deployments via GitOps practices (e.g., ArgoCD, Flux) and building Self-Service Developer Platforms that enable engineering efficiency (e.g., environment automation, internal tooling).
Networking & Edge Routing: Experience implementing and managing Multi-Cloud API Gateways and Edge Routing solutions (e.g., Kong, Traefik, Cloudflare, multi-cluster ingress).
Security & Hardening: Strong background in platform security, including secrets management, Identity and Access Control (IAM), and Runtime/Security Hardening with tools like Falco/eBPF and WAFs.
Observability: Solid understanding and practical experience with modern observability stacks (e.g., Prometheus, OpenTelemetry, OpenSearch, ELK, CloudWatch).
Mentorship & Communication: Excellent communication and collaboration skills with a proven ability to describe complex infrastructure decisions clearly and a background in mentoring engineers and driving improvements in engineering practices.
Development Expertise: Familiarity with modern programming languages like Node.js, NestJS, and Python is highly desirable for extending DevOps capabilities or integrating tooling.
Benefits
Humility and kindness: Humility is a core attribute we hire for, which means we have a culture of not taking ourselves too seriously and being able to laugh. Kindness is also incredibly important. We are committed to creating and nurturing a diverse and inclusive environment.
Flexibility: You’ll be able to work from home 5 days per week. If you need to start early or start late to cater to your family or other needs, we don’t mind, so long as you get your work done and proactively communicate. You can also work fully remote from anywhere in the world for 1 month each year
Financial benefits: We pay competitive market salaries and provide staff with generous paid time off and holiday schedules. Certain staff at Sleek are also eligible for our employee share ownership plan and can share in the upside of our stellar growth trajectory as we work toward listing on a prominent stock exchange in the Asia Pacific region.
Personal growth: You’ll get a lot of responsibility and autonomy at Sleek - we move at a fast pace so you’ll be making decisions, making mistakes and learning. There’s also a range of internal and external facing training programmes we run. We’re also at the forefront of utilising AI in our space and are developing a regional centre of AI excellence. It is our intention that if you leave Sleek, you leave as a more well-rounded person and professional.
Sleek is also a proudly certified B Corp. Since we started our journey in 2017, we’ve been committed to building Sleek as a force for good. In just over 5 years, we’ve joined a community of industry leaders like Patagonia, Ben & Jerry's, and P&G who are building an inclusive, equitable, and a regenerative economy. We have planted over 29,271 trees to reforest our ecosystem and saved 7 tons of paper from landfills by processing over 1.4M pages through SleekSign. We aim to be Carbon Neutral by 2030.
DevSecOps Engineer responsible for enhancing Thales' secure hosting platforms in public and private clouds. Collaborating with teams to apply modern practices and build resilient infrastructures.
Develops high - automation services in Golang or Java within AWS, Kubernetes, and Azure. Supports teams in building secure applications while working in a hybrid environment.
DevOps Engineer specializing in AWS Cloud Infrastructure in a hybrid position. Collaborating within a supportive team to build modern infrastructure for VM - based applications.
Leading DevOps platform strategy for KIPMI Software's next - generation digital trust products. Collaborating with teams to implement scalable infrastructure and DevSecOps practices.
Join our DevOps team to build and manage GitHub pipelines and cloud - native Azure solutions. Collaborate with teams to drive DevOps best practices and optimize deployments.
Site Reliability Engineer enhancing system reliability and deployment practices at OpenLoop. Collaborating with cross - functional teams for incident management and performance tuning.
Senior DevOps Engineer enhancing Azure application reliability for a healthcare fintech platform. Collaborating closely with engineering teams to ensure deploy safety and observability.
DevOps Engineer contributing to tooling changes and leading a community of practice at Totara. Focused on collaboration, development, and support for internal teams.
Site Reliability Engineer responsible for infrastructure supporting AI platform. Safeguarding US customer data and ensuring compliance in the Aerospace and Defense sector.