Engineering Manager leading the Site Reliability Engineering team at a fintech company. Ensuring the reliability, scalability, and performance of our digital banking platform.
Responsibilities
Lead, coach, and grow a high-performing Site Reliability Engineering team; support career development, technical excellence, and ownership
Own the reliability, scalability, and performance of Relay’s platform, ensuring our systems are resilient as the business grows
Lead and evolve SRE best practices, including incident management, on-call operations, SLIs/SLOs, and error budgets
Partner closely with Engineering, Product, and Data teams to ensure reliability and scalability are built into every feature we ship
Drive continuous improvement through post-incident reviews, root cause analysis, and preventative action
Guide infrastructure and platform investments to support long-term scalability, security, and operational efficiency
Define and track key reliability KPIs (e.g., uptime, latency, incident frequency, MTTR) and use data to inform priorities and decisions
Champion a culture of learning, operational excellence, and “running towards problems” across the engineering organization
Requirements
You have 3+ years of experience managing or leading engineers and 6+ years of experience in Site Reliability Engineering, Platform Engineering, or Infrastructure roles
You have a strong track record of owning and improving system reliability, scalability, and performance in production environments
Experienced in improving observability, performance, or operational maturity at growing companies
You’ve led teams through incident response, postmortems, and reliability improvements, using data and clear accountability to drive better outcomes
You have a strong foundation in operating and scaling production systems in cloud environments (e.g. AWS), and modern infrastructure practices (IaC, CI/CD, monitoring, alerting)
You have a proven record of partnering with Product and Engineering leaders to balance delivery velocity with long-term reliability and operational excellence
You’re a leader who knows how to coach, motivate, and grow engineers while setting a high bar for ownership, quality and technical excellence
You’re highly collaborative and experienced in leading cross-functional initiatives that span engineering, product, and operations
You thrive in fast-paced, ambiguous environments and are comfortable leading through change as the platform and organization scale.
Benefits
Competitive salary and meaningful equity: Relay employees are Relay owners, complete with equity and a competitive salary.
Comprehensive health benefits: enjoy full health benefits from day one: no probation period required. We offer flexible Health or Wellness Spending Accounts and medical, dental, and vision coverage for you and your dependents.
Flexible vacation and time off: every team member starts with 15 vacation days and 5 flex days to use as needed, plus an extra week of office closure during the end-of-year holidays so you can take time off to recharge and come back better for our customers.
Parental leave with top-up: we offer 12 weeks off with a 100% salary top-up for all full-time employees, regardless of location, and accessible for all parents: birthing, non-birthing, and adoptive.
Hybrid work environment: we value meaningful collaboration and connection at our Toronto office twice a week, with lunch, snacks, and beverages on us.
Dog-friendly space: can dogs really make you happy and healthy? We don’t know for sure, but since we don’t want to chance it, our office is 100% floof-friendly.
Personal and professional growth: through ongoing feedback, mentorship, and coaching, work with peers and leaders who are invested in your growth and success.
Top-tier equipment: as a Mac-first company, our Toronto offices have everything you need to produce your best work comfortably, from multiple screens to ergonomic seating.
Social connection: we believe in celebrating our wins with two annual company-wide get-togethers, quarterly team events, happy hours, and special events and networking opportunities.
DevOps Engineer building and maintaining authentication platforms in multi - cloud environments. Using technologies like Terraform, Ansible, and Python for automation and optimization.
Cloud Engineer developing Infrastructure - as - Code with Terraform and Azure DevOps. Managing Azure infrastructure and leading incident response within cross - functional teams.
DevSecOps Engineer at Skillfield working on secure CI/CD pipelines for mobile - first delivery. Collaborating with teams to embed security and automation in the delivery lifecycle.
Lead DevOps Engineer focused on AWS and Azure data platform solutions. Collaborating with teams to deliver scalable, secure, and highly available solutions.
DevOps Engineer working at GRÜN Software Group to automate and maintain stable infrastructures. Collaborating with teams to improve deployments and processes for better performance.
Linux System Administrator managing IT infrastructures for educational institutions and research. Collaborating on DevOps and HPC projects while ensuring system security and performance.
Azure SRE Engineer responsible for designing and maintaining secure, scalable Azure cloud infrastructure. Driving automation and operational excellence for leading organizations in technology transformation.
Senior Manager of Site Reliability Engineering overseeing Workday Kubernetes based platform. Leading teams while ensuring high availability and collaborating with federal agencies.
Site Reliability Engineer focusing on AWS cloud environments, SRE practices, and system reliability within GFT's team. Collaborating on cloud migrations and observability initiatives.