Staff SRE Tech Lead overseeing platform reliability and scalability at Unify. Leading an SRE pod while enhancing infrastructure performance and implementing reliability best practices.
Responsibilities
Lead the SRE pod: Set technical direction, drive prioritization, and mentor engineers—ensuring the team is tackling the highest-leverage reliability and scalability challenges.
Scale our data infrastructure: Architect and extend our ClickHouse and PostgreSQL deployments to handle terabytes of new data monthly; designing partitioning strategies, tuning queries, and building resilient replication and failover systems.
Improve system performance: Profile and optimize critical paths across our backend services, identify bottlenecks in data pipelines and API layers, and ship changes that meaningfully improve latency and throughput.
Build for reliability: Design and implement rate limiting, circuit breakers, graceful degradation, and other patterns that keep the platform stable under load and during partial failures.
Automate everything: Drive tooling that eliminates toil—automating deployments, scaling operations, backup verification, and incident remediation.
Instrument and observe: Build out distributed tracing, metrics, and alerting that give engineers clear visibility into system behavior and make debugging production issues fast.
Define and enforce SLOs: Establish reliability targets aligned with customer needs, manage error budgets, and drive architectural decisions that balance shipping speed with system stability.
Requirements
8+ years of software engineering experience with a strong backend foundation, including 3+ years focused on reliability, infrastructure, or platform work.
Experience leading teams or pods—setting technical direction, mentoring engineers, and driving execution on complex projects.
Deep expertise operating databases at scale, including schema design, query optimization, replication, and failover strategies.
Strong programming skills (Typescript, Python, Go, or similar) with a track record of building automation and tooling that meaningfully reduces operational burden.
Collaborative, low-ego attitude with a history of leveling up the people around you.
Cloud Engineer specializing in hybrid - cloud platform design and operation at Dun & Bradstreet. Collaborating closely with team members to enhance developer self - service and automation capabilities.
DevOps Engineer II evolving cloud infrastructure and CI/CD pipelines at HackerRank. Collaborating with teams to design, build, and optimize systems for developer productivity.
DevOps Engineer managing CI/CD pipelines and cloud infrastructure for mobile apps at Air Apps. Collaborating with teams to ensure app performance and reliability.
DevOps Engineer at Vodafone Romania delivering resilient infrastructure for software development lifecycle. Collaborating with Digital Squads and optimizing CI/CD pipelines for efficient deployments.
Mechanical/Reliability Engineer responsible for mechanical installations in Bergen op Zoom. Analyzing maintenance strategies and leading projects to enhance reliability.
Senior DevOps Engineer responsible for cloud infrastructure and deployments. Optimizing AWS services and ensuring system security and reliability for Verizon.
Senior DevOps Engineer responsible for automating infrastructure and building CI/CD pipelines for collaborative robotics company. Collaborating with global engineering teams from the Bangalore office.
Site Reliability Engineer Intern at Tencent working on gaming services and cloud native solutions. Collaborating with global teams to eliminate toil and enhance reliability.
Cloud/DevOps Specialist at N5X managing and optimizing critical cloud infrastructures for Brazilian energy trading. Collaborating with a multidisciplinary team to ensure high availability and performance.
Cloud/Devops Specialist responsible for designing a hybrid architecture combining cloud and on - premises infrastructure for energy trading systems. Collaborating with a multidisciplinary team in a dynamic environment.