Acts as a project or system leader, coordinating the activities of other engineers on the project or within the system
Determines the technical tasks that other engineers will follow
Proactively improves existing structures & processes and exercises judgement in reconciling diverse priorities
Define mobile-specific SLIs and SLOs (e.g., crash-free sessions, ANRs, app startup time, network success rates, battery/memory usage)
Establish best practices for observability, alerting, and incident response in Datadog
Lead development of automation and tools for mobile reliability (automated regression detection, performance benchmarking, crash/ANR triage, release health dashboards, instrumentation libraries)
Ensure tooling aligns with existing systems (Harness for CI/CD, Gradle/Bazel for builds)
Act as primary liaison with backend/web SRE leadership for incident response and shared visibility
Partner with Release Engineering, QA, and Product to ensure operational readiness of new features
Influence architecture and design decisions to prioritize mobile reliability
Lead cultural change: define and roll out on-call model for mobile teams and champion a blameless postmortem culture
Mentor and guide a distributed team of senior Mobile SREs and provide technical leadership in complex incidents
Help recruit and onboard new SREs and set technical and cultural standards
Partner with infrastructure and developer productivity teams to integrate Bazel and Gradle builds into reliable CI/CD pipeline and establish long-term roadmaps for mobile reliability
Requirements
Minimum of 8 years of relevant work experience
Bachelor's degree or equivalent experience
8+ years of experience in software engineering, SRE, or mobile systems roles
Strong understanding of iOS and/or Android performance and reliability challenges
Hands-on experience with Datadog (or equivalent observability platforms) for monitoring, alerting, and dashboards
Proven ability to define and implement SLIs/SLOs across complex, distributed systems
Experience leading on-call rotations, incident response, and postmortems
Demonstrated experience building automation and internal tools for reliability
Strong programming skills in Python, Go, or similar
Working knowledge of Swift/Kotlin for client instrumentation
Exceptional ability to influence and partner across engineering, product, and SRE orgs
Track record of mentoring engineers and leading distributed teams
Preferred: Experience with CI/CD for mobile (Harness, Fastlane, Jenkins)
Preferred: Familiarity with Bazel and Gradle build systems
Preferred: Prior experience introducing cultural changes (e.g., adopting on-call or reliability practices)
Strong knowledge of backend service reliability concepts, to bridge between client and server
Benefits
annual performance bonus (or other incentive compensation, as applicable)
equity
medical, dental, vision, and other benefits
health and life insurance
employee shares options
flexible work environment
balanced hybrid work model offers 3 days in the office and 2 days at your choice of either the PayPal office or your home workspace
resources for financial, physical, and mental health
Network & Datacenter Deployment Engineer at Cloudflare focused on building and expanding their global network infrastructure with collaboration across multiple engineering teams and vendors.
Senior DevOps Engineer leading cloud - native solutions at Sparksoft Corporation. Driving automation and system reliability within a fast - paced Agile team.
Platform Engineer focusing on supporting CI/CD pipelines and Kubernetes at PCCW. Responsible for ensuring platform services' reliability and performance, with night - time support as needed.
Site Reliability Engineer at Bumble optimizing large - scale Linux environments and ensuring system stability. Focusing on troubleshooting, incident recovery, and performance tuning in complex infrastructures.
Senior DevOps Manager overseeing CI/CD processes for NVIDIA Networking products. Leading a team and collaborating with global teams to enhance R&D efficiency and infrastructure.
DevOps Manager overseeing engineering team developing scalable CI/CD processes for NVIDIA Networking products. Enhancing global R&D efficiency in a technology - focused company.
Join Operations Team as Senior Site Reliability Engineer driving operational excellence for cybersecurity solutions. Collaborate across teams to manage production platforms and optimize infrastructure.
Software Developer - DevOps System Administrator working within the SCMT team to enhance software application efficiency. Collaborating on tools and scripts for application lifecycle management.
DevOps Engineer managing CI/CD pipelines and Kubernetes deployments at Stefanini. Collaborating with teams to optimize application health and deployment processes.