Hybrid Senior Cloud Site Reliability Engineer

Posted last month

Apply now

About the role

  • Responsible for daily operations of Solace Cloud, the market-leading SaaS offering, across AWS, Azure, GCP, Kubernetes, etc.
  • Ensure Solace Cloud Services are healthy and reliable and SLAs are being met
  • Design and implement infrastructure tooling, observability, and automation
  • Improve production operations to be more efficient and less error-prone
  • Handle production incidents according to industry-standard incident management processes
  • Process service requests and provisioning by customers
  • Manage customer escalations and drive resolution in mission-critical, high-impact production environments
  • Work directly with customers to identify, troubleshoot, and resolve operational issues
  • Debug Linux and Kubernetes at a system level to detect and resolve operational issues
  • Participate in on-call rotation and provide 24x7 off-hours support

Requirements

  • Proven expertise with public cloud providers (AWS, Azure, GCP) services & features
  • Proven expertise with cloud Kubernetes infrastructure platforms (EKS, AKS, GKE)
  • Hands-on experience with Monitoring tools like Datadog, Kibana, Prometheus
  • Hands-on experience with Infrastructure Automation using Terraform, CloudFormation
  • Hands-on expertise in debugging production alerts
  • Expert-level understanding of Linux Operating Systems
  • Programmer in languages such as Groovy, Python, and Go
  • Certified Kubernetes Administrator
  • Certified Cloud Administrator (AWS, Azure, or GCP)
  • Expert-level knowledge in Cloud Networking Solutions
  • Expert-level knowledge in handling production incidents in multi-cloud environments
  • Proven ability to manage customer escalations and drive resolution in mission-critical production environments
  • Experience in SaaS operations and customer-facing technical support
  • Be on-call rotation and provide 24x7 off-hours support
  • Strong communicator able to articulate complex technical issues and communicate with customers
  • Ideally 7+ years of work experience in a technical role
  • Must be able to work in/commute to Ottawa area; eligibility to work in Canada asked in application

Benefits

  • Hybrid work model (2 days in-office in the Ottawa area)
  • Work-life balance
  • Top-notch training programs
  • Inclusive environment and accommodations during hiring
  • Opportunity to work with a stellar customer lineup
  • Collaborative, social and fun culture
  • Top-ranked employer on Glassdoor
  • Emphasis on craftsmanship, trust, courage, freedom, momentum, humility, and human experience

Job title

Senior Cloud Site Reliability Engineer

Job type

Experience level

Senior

Salary

Not specified

Degree requirement

No Education Requirement

Location requirements

Report this job

See something inaccurate? Let us know and we'll update the listing.

Report job