Senior Manager, DevOps responsible for scaling and owning platform operations at FloQast. Collaborating with cross-functional teams and managing DevOps Engineers in a hybrid setting.
Responsibilities
Lead, mentor, and scale a DevOps organization; build career paths and leadership bench
Define and execute the DevOps, reliability, and observability strategy aligned with business goals
Own platform reliability, availability, and performance for a production SaaS platform
Establish and mature observability practices (metrics, logs, traces, alerts, dashboards)
Drive infrastructure initiatives across AWS focused on scalability, resilience, and modernization
Own and mature incident management including on-call, response, executive communication, and postmortems
Oversee day-to-day operational excellence including CI/CD, deployments, and environment health
Set and manage cloud cost strategy, forecasting, and optimization in partnership with Finance
Partner with Security and Compliance on SOC2, SOX, and audit readiness
Support AI/ML and data platform workloads as part of the broader infrastructure strategy
Requirements
10+ years of DevOps / SRE / Infrastructure experience
4+ years managing DevOps or Platform teams
Deep expertise with AWS at scale (multi-account, networking, IAM)
Strong hands-on background with Terraform, Kubernetes, and CI/CD
Proven ownership of incident management and operational maturity
Experience building and operating observability platforms for SaaS systems
Experience with AI/ML or data-intensive platforms
Observability tools such as Datadog, Grafana, Prometheus, OpenTelemetry
Reliability Engineering Specialist utilizing reliability tools and models to improve asset performance at Enbridge. Collaborating across teams to guide investment decisions for safe operations.
DevOps Engineer responsible for structuring and supporting cloud DevOps architecture in Brazil. Working strategically on automation and CI/CD practices with development teams in Pernambuco.
DevSecOps Software Engineer developing secure CI/CD pipelines for Boeing's military software systems. Collaborate with cross - functional teams and implement automation and security best practices.
DevOps Manager responsible for managing a team for multi - cloud solutions supporting the USAF Cloud One project. Focus on scalable cloud - native solutions and CI/CD practices.
Lead Site Reliability Engineer overseeing SRE practices across Azure and GCP platforms. Driving reliability improvements and leading a team at Lloyds Banking Group.
DevOps Engineer responsible for managing Microsoft Intune operations at Bundesdruckerei GmbH. Focused on ensuring secure digital solutions for identity and data protection in Berlin.
Senior Site Reliability Engineer driving observability and reliability for business - critical systems at Incedo. Collaborating with engineering teams to enhance system resilience and performance.
DevSecOps Specialist securing the software development lifecycle at Vanguard. Collaborating with teams to improve application security tooling and processes, and provide development guidance.
Site Reliability Engineer automating infrastructure deployment for Scaleway's sovereign cloud products. Collaborating with product teams to enhance observability and reliability of the platform.