Onsite SRE Technical Manager – Transport

Posted 3 hours ago

Apply now

About the role

  • SRE Technical Manager leading reliability engineering teams ensuring performance for Navy IT services. Manage teams, collaborate on automation, and drive continuous improvement in a critical systems environment.

Responsibilities

  • Manage and mentor 5-6 SRE teams (pods) and 60+ FTEs, providing guidance, setting performance expectations, and fostering professional development.
  • Work collaboratively with SRE Resource Managers to staff and maintain engineering resources for your SRE vertical teams' reliability and scalability goals.
  • Responsible for the P&L across the Transport Services vertical. Manage the SRE team’s resources, including budget planning, tool selection, and infrastructure investments to meet reliability and scalability needs.
  • Meet regularly with your team members, participate in performance reviews and interviews, and development planning.
  • Oversee the reliability, availability, and performance of critical systems by leading the SRE teams within the data center vertical in implementing monitoring, incident response, and performance optimization strategies.
  • Ensure the team adheres to best practices for system reliability, automation, and operational efficiency.
  • Drive continuous improvement initiatives by analyzing performance metrics (e.g., SLOs, MTTR, MTBF) and identifying areas for enhancement.
  • Collaborate with operations, quality, cybersecurity and other SRE engineering teams to define and enforce Service Level Objectives (SLOs) and manage error budgets.
  • Act as a liaison between the SRE team and other departments to prioritize reliability and operational needs in the product development process.
  • Collaborate with senior leadership to define the SRE strategy, set long-term reliability goals, and ensure alignment with business objectives.
  • Lead efforts to reduce operational toil through automation. Work with the team to build or enhance automation tools that manage infrastructure, monitor systems, and respond to incidents.
  • Oversee the development and adoption of Infrastructure as Code (IaC) tools, CI/CD pipelines, and other automation processes.
  • Ensure that SRE practices align with organizational security policies and compliance requirements.
  • Collaborate with security teams to integrate reliability-focused security practices into the design and operation of systems.
  • Ensure systems meet or exceed agreed-upon service levels by proactively addressing potential issues and working with stakeholders to align on reliability expectations.
  • Work within a SRE team, collaborating with other Developers, Security, and Operations, to continuously deliver products and increase the value stream for the organization and customers.
  • Embrace and champion Agile development processes and adoption to modern Site Reliability Engineering workflows and practices while providing technical guidance to team members and coworkers on best practices.
  • Stay up to date on the latest Site Reliability Engineering practices and technologies.
  • Strive to provide internal and external customers with excellent customer service and world-class service.
  • Resolve most conflicts between timeline, budget, and scope independently but intuitively raise sophisticated or consequential issues to senior management.

Requirements

  • Requires B.S. Degree (or equivalent) in Cybersecurity, Information Security, IT, Network Engineering, Computer Science, or related field or Master's with 6+ years of prior relevant experience with 8-10 years of SRE or DevOps experience and at least 4 years in a leader or manager capacity.
  • US Citizen with DoD Secret Clearance.
  • Minimum of DoD 8570.01 IAT Level II Certification required prior to onboarding and must maintain certification while supporting the SMIT Contract.
  • Must be able to support program execution in classified environments and access SIPRNet from an NMCI location on short notice (local travel).
  • Exceptional written and oral communication skills include producing technical analysis/reports, presentations and executive level briefings with internal and external stakeholders.
  • Ability to review requirements, comprehend, and solution capabilities that satisfy customer requirements.
  • Ability to work in a highly collaborative, forward thinking, and innovation-driven environment.
  • Proven experience managing teams responsible for large-scale, distributed systems with high reliability and performance demands.
  • Strong track record of managing incidents, conducting postmortems, and implementing reliability improvements.
  • Experience implementing and managing Agile or DevOps processes, with a focus on continuous improvement, efficiency, and team productivity.
  • Ability to lead teams through strategic initiatives such as reliability maturity assessments, process automation, and tooling selection.
  • Solid understanding of SRE principles, including Service Level Objectives (SLOs), Service Level Indicators (SLIs), and error budgeting.
  • Experience with commercial cloud infrastructure deployment environments such as AWS and Azure.
  • Strong knowledge of automation tools, CI/CD pipelines, and Infrastructure as Code (IaC).
  • Experience with Agile and DevSecOps/SRE concepts and best practices.
  • Hand-on experience with Atlassian products (Jira, Confluence, Bitbucket, etc.).
  • Experience creating JIRA and/or Azure DevOps workflows, projects, custom configurations.
  • Solid experience with integrating/maintaining with various 3rd party CI/CD tools like Jenkins and Gitlab.
  • Experience with automated provisioning and configuration tools like Terraform, Cloud Formation, Ansible, or similar technologies.
  • Working knowledge of the Risk Management Framework (RMF), DISA STIGs.

Benefits

  • Health and Wellness programs
  • Income Protection
  • Paid Leave
  • Retirement

Job title

SRE Technical Manager – Transport

Job type

Experience level

SeniorLead

Salary

$116,350 - $210,325 per year

Degree requirement

Bachelor's Degree

Location requirements

Report this job

See something inaccurate? Let us know and we'll update the listing.

Report job