Develop and execute the strategic roadmap for our private cloud and developer platform, aligning technology investments with business objectives.
Lead architecture forums, platform RFC processes, and cross-organizational guardrail discussions to ensure cohesive governance.
Engage stakeholders across engineering, infrastructure, and executive teams, communicating complex technical concepts in clear business terms.
Deliver Developer Experience (DevEx) as a measurable platform outcome, building paved roads for teams to onboard quickly and safely.
Provide golden paths for service creation, CI/CD, observability, and security through reusable templates and self-service tooling.
Establish and track time-to-first-deploy and feedback-loop performance metrics to continuously improve developer productivity.
Lead modernization of our private cloud orchestration capabilities with a focus on scalability, security, and interoperability.
Manage hybrid estates, bridging VM and containerized workloads across OpenStack (Nova, Neutron, Cinder) and OpenShift Virtualization, ensuring compliance and SLA adherence.
Guide architectural decisions around Autopilot vs. Standard GKE, Workload Identity, Shielded Nodes, Binary Authorization, autoscaling strategies, and release channels.
Drive GitOps practices using Argo CD or Flux, and integrate Anthos Config Management/Fleet where applicable.
Embed policy-as-code frameworks (e.g., OPA/Gatekeeper, Kyverno, CUE/Rego) into all deployment pipelines for compliance and standardization.
Standardize and scale CI/CD pipelines (Jenkins, GitLab CI, GitHub Actions, or equivalent), supporting progressive delivery via Argo Rollouts, Flagger, and canary/blue-green deployment strategies.
Own IaC at scale using Terraform, Helm, and charts management, as well as Ansible to ensure consistency and reproducibility.
Champion service mesh adoption (Istio or equivalent: Linkerd/Consul etc) with a focus on mTLS by default, traffic splitting, Gateway API, and advanced traffic policies.
Oversee network design, NetworkPolicies (Calico/Cilium), and VPC configurations to meet performance and security objectives.
Strengthen observability practices, integrating Dynatrace with OpenTelemetry, Prometheus, and Grafana for metrics, logs, and traces correlation.
Build SLO dashboards for both engineers and executives, enabling data-driven decision-making.
Lead the SRE charter, defining and owning SLIs/SLOs, error budgets, incident response, postmortems, chaos testing, and disaster recovery (RTO/RPO).
Tie reliability goals directly to business KPIs and customer experience.
Implement preventive security controls across the platform: IAM/RBAC strategies, Workload Identity Federation, and secret management (Vault/KMS).
Integrate SBOM generation, SLSA provenance, image signing/verification (Binary Authorization), and policy enforcement at deploy time.
Enforce compliance via NetworkPolicies, runtime security checks, and continuous vulnerability management.
Lead, mentor, and grow a high-performing engineering team (15–30+ engineers) focused on platform scalability, reliability, and innovation.
Foster a collaborative, learning-oriented culture that values technical excellence, psychological safety, and outcome-driven engineering.
Manage budgets, vendor relationships, and multi-year platform roadmaps across business units.
Requirements
Bachelor’s degree in Computer Science, Computer Engineering, or equivalent combination of relevant education and experience.
10+ years of experience in cloud infrastructure and platform engineering, with a proven track record of leadership in managing private cloud infrastructure.
7+ years of technical experience including:
GKE, Anthos, OpenShift, OpenStack, and VMware.
Experience with infrastructure as code (IaC) tools such as Terraform or Ansible, and proficiency in using Kubernetes-native tools like Helm.
Programming languages like Go, Cue, Rust, Python, Java, Node.js etc. Hands-on experience with Terraform, Argo/Flux, OPA/Gatekeeper/Kyverno, and observability stacks (Dynatrace, OpenTelemetry, Prometheus, Grafana).
3+ years of experience leading large teams** **(~15–30+ engineers) platform or SRE organizations. This should include team management skills including experience in building and leading diverse engineering teams.
4+ years of strategic, communication and problem-solving experience to include:
Strategic Thinking: Ability to develop and implement strategic plans that align with business objectives and drive technological innovation.
Communication Skills: Excellent verbal and written communication skills, with the ability to convey complex technical information to both technical, non-technical, and executive audiences. Ability to traverse multi-layer complex cross org structures, driving critical outcomes.
Problem-solving Skills: Strong analytical and problem-solving skills, with the ability to make data-driven decisions and manage risk effectively.
Benefits
Immediate medical, dental, and prescription drug coverage
Flexible family care, parental leave, new parent ramp-up programs, subsidized back-up childcare and more
Vehicle discount program for employees and family members, and management leases
Tuition assistance
Established and active employee resource groups
Paid time off for individual and team community service
A generous schedule of paid holidays, including the week between Christmas and New Year’s Day
Paid time off and the option to purchase additional vacation time.
Senior Cloud Architect responsible for Microsoft Azure infrastructure optimization at Quest One, contributing to climate protection efforts. Engaging in IT governance and providing user support in Hamburg, Germany.
Cloud Computing Platform Administrator responsible for maintaining CI/CD tools for Desjardins Group. Involved in DevOps practices and application security integration.
Infrastructure and Cloud Computing Architecture Consultant designing and automating technology architectures at Beneva. Collaborating with IT and business teams for strategic project support.
Senior Cloud Platform Engineer at Smarsh, focusing on architecting and building hybrid cloud platforms. Contribute to risk management and compliance in digital communications.
Intern Cloud Engineer focusing on Azure technologies at HF Sinclair. Assisting with deployments and engaging in cross - functional collaboration for digital initiatives.
Designing and developing AI infrastructure on AWS Cloud for pharmaceutical applications. Collaborating with research scientists to optimize and secure the computing resources.
Cloud DevOps Engineer enabling cloud - native AWS applications delivery at Boeing. Collaborating on CI/CD and Infrastructure - as - Code for secure and observable deployments.
Senior Cloud Platform Engineer at Yora designing and maintaining serverless applications. Leading technical decisions and collaborating with product owners in a dynamic tech environment.
SAP Basis & Cloud Engineer joining client for international modernization and cloud deployment projects. Focused on SAP systems administration and integrations in a hybrid work environment.
Data Engineer focused on optimizing data pipelines and processing for a leading data analytics company. Collaborating with stakeholders to drive data - informed decision - making through business intelligence solutions.