Manage the installation and maintenance of Apigee runtime environments across multiple data centers, ensuring seamless operations and deployments.
Ensure high availability, scalability, and optimal performance of the Apigee platform to meet business and user demands.
Implement and maintain comprehensive monitoring, logging, and alerting solutions to proactively identify and address issues.
Automate infrastructure provisioning and management to enhance efficiency and reduce manual intervention using leading automation tools.
Troubleshoot and resolve runtime issues promptly to minimize downtime and maintain service reliability.
Participate in a 24/7 on-call support rotation to ensure continuous system operability and address critical issues as they arise.
Requirements
Strong experience in managing runtime environments, with specific expertise in Apigee hybrid.
Expertise in ensuring system availability, scalability, and performance, with a focus on delivering uninterrupted services.
Proficiency in using monitoring, logging, and alerting tools to maintain high visibility into system operations and preemptively identify potential issues.
Skills in infrastructure automation, including scripting and use of CI/CD pipelines to streamline operations.
Proven ability to troubleshoot and resolve technical issues efficiently, reducing mean time to recovery (MTTR).
Knowledge of Kubernetes (k8s) for container orchestration and GitHub action for automatic deployment.
For candidates located in Quebec, bilingualism is required considering the necessity to interact on a regular basis with English speaking colleagues across the country.
No Canadian work experience required however must be eligible to work in Canada.
Benefits
A financial rewards program that recognizes your success
An industry leading Employee Share Purchase Plan; we match 50% of net shares purchased
An extensive flex pension and benefits package, with access to virtual healthcare
Flexible work arrangements
Possibility to purchase up to 5 extra days off per year
An annual wellness account that promotes an active and healthy lifestyle
Access to tools and resources to support physical and mental health, embracing change and connecting with colleagues
A dynamic workplace learning ecosystem complete with learning journeys, interactive online content, and inspiring programs
Inclusive employee-led networks to educate, inspire, amplify voices, build relationships and provide development opportunities
Inspiring leaders and colleagues who will lift you up and help you grow
A Community Impact program, because what you care about is a part of what makes you different.
Network & Datacenter Deployment Engineer at Cloudflare focused on building and expanding their global network infrastructure with collaboration across multiple engineering teams and vendors.
Senior DevOps Engineer leading cloud - native solutions at Sparksoft Corporation. Driving automation and system reliability within a fast - paced Agile team.
Platform Engineer focusing on supporting CI/CD pipelines and Kubernetes at PCCW. Responsible for ensuring platform services' reliability and performance, with night - time support as needed.
Site Reliability Engineer at Bumble optimizing large - scale Linux environments and ensuring system stability. Focusing on troubleshooting, incident recovery, and performance tuning in complex infrastructures.
Senior DevOps Manager overseeing CI/CD processes for NVIDIA Networking products. Leading a team and collaborating with global teams to enhance R&D efficiency and infrastructure.
DevOps Manager overseeing engineering team developing scalable CI/CD processes for NVIDIA Networking products. Enhancing global R&D efficiency in a technology - focused company.
Join Operations Team as Senior Site Reliability Engineer driving operational excellence for cybersecurity solutions. Collaborate across teams to manage production platforms and optimize infrastructure.
Software Developer - DevOps System Administrator working within the SCMT team to enhance software application efficiency. Collaborating on tools and scripts for application lifecycle management.