Design, implement, and maintain CI/CD pipelines using Jenkins, GitLab CI, Argo CD, etc.
Automate infrastructure provisioning and configuration using Terraform, Ansible, and Helm.
Deploy, configure, and manage Kubernetes clusters in cloud and on-premises environments (EKS, AKS, GKE, Rancher, RKE2, k3s, OpenShift).
Enforce Kubernetes security best practices (RBAC, PodSecurity, secrets, network policies).
Monitor and tune Kubernetes workloads for performance and reliability.
Administer, operate, and troubleshoot distributed database systems (e.g., Cassandra, MongoDB, Cockroach DB, etcd) within Kubernetes, ensuring high availability, data consistency, and performance.
Ensure high availability, scalability, backup/recovery, and disaster recovery strategies for databases.
Implement observability stacks (Prometheus, Grafana, ELK, Zabbix, etc.) for infrastructure and applications.
Partner with dev teams to design scalable deployment patterns and troubleshoot pipeline/build/deploy issues.
Maintain detailed technical documentation for environments, playbooks, and architectural decisions.
Mentor peers and team members in DevOps tools, Kubernetes, and cloud-native practices.
Requirements
3+ years managing Kubernetes in production
Expertise with container tools (Docker, Podman) and orchestration (Kubernetes, Helm).
Strong CI/CD experience with GitLab, Jenkins, Argo CD, and GitOps workflows.
Proficient in Infrastructure as Code (Terraform, CloudFormation, Ansible).
Deep knowledge of managing distributed databases in Kubernetes including StatefulSets, PVCs, dynamic volume provisioning. Backup, recovery, scaling, and clustering techniques.
Cloud experience in on-prem, AWS, GCP, Azure or OpenStack; experience with hybrid/multi-cloud preferred.
Familiarity with service meshes and Kubernetes networking (Istio, Calico, Cilium).
Proficient in Bash, Python, or similar scripting languages.
Strong analytical and troubleshooting abilities across app, infra, and DB layers.
Clear communication and ability to collaborate across development, QA, security, and operations.
Self-motivated, detail-oriented, and comfortable in high-paced, on-call environments.
Excellent documentation habits and focus on operational excellence.
Familiarity with compliance standards (FIPS, FedRAMP, FISMA).
Certifications in Kubernetes (CKA/CKAD), AWS/GCP, or Terraform.
Benefits
Competitive Salary & Incentives: We offer a competitive compensation package with and pre-IPO equity to reward your hard work and dedication.
Health & Wellness: Comprehensive medical, dental, and vision insurance plans to ensure you and your family stay healthy and covered.
Paid Time Off (PTO): Enjoy a generous PTO policy that includes vacation days, sick leave, and paid holidays to recharge and take care of personal matters.
Flexible Work Environment: We understand the importance of work-life balance. Enjoy the flexibility of remote work, and hybrid option to create the work schedule that works best for you.
Professional Development: We believe in continuous learning. Access to training, certifications, and educational resources to help you grow in your career and stay ahead of industry trends.
Employee Recognition: We celebrate achievements both big and small, with regular recognition programs and awards that highlight your contributions to our collective success.
Collaborative Culture: Be part of a dynamic, inclusive, and supportive team where innovation and collaboration are at the heart of everything we do.
Parental Leave: Generous parental leave policies to support you during life's important moments.
DevOps Product Manager working on complex platform and infrastructure projects. Consulting on DevOps best practices and ensuring scalable, efficient digital ecosystems for clients.
Site Reliability Engineer optimizing large - scale Linux environments at Bumble Inc. Troubleshooting incidents and driving performance improvements on platforms such as Kafka and Kubernetes.
Senior DevOps Engineer at mylo, managing multi - cloud infrastructure and CI/CD pipelines. Promoting DevOps culture while ensuring compliance and automating system maintenance.
Lead Site Reliability Engineer at S&P Global's Cloud Engineering team. Responsible for designing and maintaining cloud infrastructure and ensuring the performance of cloud - based systems.
Site Reliability Engineer responsible for monitoring and improving the reliability of satellite operations infrastructure. Collaborating with teams to automate processes in a dynamic environment.
DevOps Analyst providing high quality and reliable solutions within multifuncional teams at technology - focused financial organization. Automating build and deployment solutions in a hybrid work environment.
Network & Datacenter Deployment Engineer at Cloudflare focused on building and expanding their global network infrastructure with collaboration across multiple engineering teams and vendors.
Senior DevOps Engineer leading cloud - native solutions at Sparksoft Corporation. Driving automation and system reliability within a fast - paced Agile team.