Manager, Customer Reliability Engineering at MTN Uganda | Hybrid Hired

About the role

Achieve measurable improvements in system uptime and performance by implementing robust reliability engineering practices and leading incident prevention initiatives.
Reduce Mean Time to Detect (MTTD) and Mean Time to Resolve (MTTR) through streamlined incident response protocols and team readiness.
Build, lead, and develop a skilled team of Customer Reliability Engineers with a strong focus on ownership, collaboration, and continuous learning.
Ensure that reliability is embedded into service design, development, deployment, and operations by partnering with engineering, product, and operations teams.
Deliver clear and actionable reporting on reliability metrics to support leadership decision-making and continuous improvement.
Align reliability goals with customer expectations by addressing root causes of service degradation and championing seamless user experiences.
Identify and address potential reliability risks before they impact customers by implementing observability tools, runbooks, and automated responses.
Drive reliability improvements that reduce operational costs by eliminating manual processes, optimizing resource usage, and reducing reactive work.
Oversee timely incident response, root cause analysis, and implementation of long-term fixes to prevent recurring issues and improve service resilience.
Work closely with software engineering, DevOps, product, and support teams to embed reliability into the end-to-end service lifecycle.
Ensure effective monitoring systems, dashboards, and alerts are in place to detect, respond to, and analyze system performance and failures.
Define and drive the implementation of a reliability roadmap aligned with business objectives, system scalability, and customer needs.
Translate system performance into customer impact metrics (e.g., NPS, downtime minutes) and continuously enhance the end-user experience.
Track and report on key reliability metrics such as uptime, latency, error rates, and incident frequency to support transparency and data-driven decisions.
Proactively identify technical and operational risks, ensuring mitigation strategies are in place and aligned with compliance standards.
Foster a culture of experimentation and improvement by exploring automation, new tools, and process enhancements to strengthen reliability practices.

Requirements

Bachelor's Degree in Computer Science, Software Engineering, Information Technology, or a related technical discipline.
Certifications in relevant areas such as Site Reliability Engineering (SRE), DevOps, ITIL, or Cloud Infrastructure (e.g., AWS, Azure, GCP) are highly desirable.
A Master's Degree in Technology Management, Engineering, or Business Administration is an added advantage.
Experience: 7–10 years of experience in IT operations, systems engineering, or reliability engineering within a technology-driven environment.
At least 3–5 years in a leadership or managerial role, with proven experience leading reliability or DevOps team.
Hands-on experience implementing and managing observability platforms, monitoring tools (e.g., Prometheus, Grafana, Splunk), and automation frameworks.
Demonstrated ability to lead incident response efforts, conduct root cause analysis, and implement sustainable, long-term service reliability improvements.
Experience working in agile environments and with cross-functional teams, including software development, infrastructure, product, and support.
Strong understanding of cloud-native technologies, container orchestration (e.g., Kubernetes), CI/CD pipelines, and infrastructure as code (e.g., Terraform, Ansible).

Similar roles

Browse all Site Reliability Engineer jobs

8 hours ago

PT

Palantir TechnologiesSite Reliability Operations Analyst – Commercial

Site Reliability Operations Analyst responsible for enhancing deployment processes at Palantir. Contributing to various projects while collaborating with teams globally.

Hybrid Role

Seoul South Korea Site Reliability Engineer

16 hours ago

ZE

ZeusManager, DevSecOps

Manager, DevSecOps at Zeus guiding a team to automate and enhance software development lifecycle efficiency. Leading integration of security practices across development, security, and operations teams in a premier polymer manufacturer.

Hybrid Role

Orangeburg United States Site Reliability Engineer

20 hours ago

EO

Ellison Institute of Technology OxfordDevSecOps Engineer

DevSecOps Engineer for the Pathogen Programme developing secure data platforms. Focus on building automation for cross - functional teams in a collaborative research environment.

Hybrid Role

Oxford United Kingdom Site Reliability Engineer

yesterday

KG

Klüh Service Management GmbHCook, weekends off, company cafeteria

Koch für Betriebsrestaurant bei Klüh Catering in Reutlingen. Zubereitung von Speisen und Unterstützung bei Firmenevents für ein familiengeführtes Unternehmen.

Onsite Role

Reutlingen Germany Site Reliability Engineer

yesterday

UR

U.S. Air Force ReserveHR Consultant – Employment Law

HR Consultant advising on labor law and employee relations at the US Air Force in Ramstein. Supporting international staff and ensuring compliance with German labor regulations.

Onsite Role

Ramstein Germany Site Reliability Engineer

€3,750 - €6,280 per month

yesterday

AR

AHORN Hotels & ResortsSales Manager – Coach Travel

Sales Manager for AHORN Hotels & Resorts responsible for customer acquisition and sales trips. Collaborating with departments to coordinate events and manage customer relationships.

Onsite Role

Berlin Germany Site Reliability Engineer

yesterday

FC

Fuchs + Sanders Schrauben-Großhandels-GmbH + Co.KGField Sales Representative – Sales Region Freiburg–Konstanz–Ravensburg

Sales Representative managing customer relationships in the Freiburg - Konstanz - Ravensburg region for Fuchs+Sanders. Responsible for customer acquisition, market observation, and achieving sales targets.

Onsite Role

Filderstadt Germany Site Reliability Engineer

yesterday

FC

Fuchs + Sanders Schrauben-Großhandels-GmbH + Co.KGField Sales Representative – Kassel-Fulda Sales Region

Sales Representative managing client relationships and acquiring new customers in the Kassel - Fulda region. Focus on achieving defined sales targets within a family - run wholesaler.

Onsite Role

Lotte Germany Site Reliability Engineer

yesterday

MP

MÜTRA Objektmanagement GmbH|PHM PartnerFachkraft in der Unterhaltsreinigung

Position in Facility Management located in Leipzig, Germany. Focus on structured cleaning in daycare facilities adhering to regulations.

Onsite Role

Leipzig Germany Site Reliability Engineer

€14 - €17 per hour

yesterday

MP

MÜTRA Objektmanagement GmbH|PHM PartnerFachkraft in der Unterhaltsreinigung

Cleaning professional for children's facilities in Berlin, ensuring adherence to sanitation protocols and efficient use of cleaning tools.

Onsite Role

Berlin Germany Site Reliability Engineer

€14 - €17 per hour