Principal Engineer driving systemic reliability improvements for Xero's software products. Leading technical initiatives and mentoring teams in engineering excellence.
Responsibilities
Drive strategic technical direction across the organisation, focusing on systemic reliability improvements.
Help define the standard for engineering excellence at Xero and lead initiatives to grow technical capabilities.
Evolve the technical architecture of our software.
Implement strategic capacity forecasting and performance optimisation to ensure critical systems accommodate growth without service degradation.
Simplify complex challenges and champion innovation and automation while designing expert-level tools and frameworks that significantly reduce manual toil.
Ensure the reliability, availability, and performance of products through proactive engineering and deep partnership.
Actively mentor and elevate the SRE capabilities of the wider organisation, setting code quality and best practices.
Requirements
Extensive career background in site reliability engineering, combined with advanced software engineering skills.
Expert-level knowledge of distributed systems, cloud platforms, and microservices architecture.
Skilled in designing and implementing expert-level automation frameworks and tools (not just one-off scripts).
Exceptional ability to mentor and guide others, fostering a culture of SRE excellence across the organisation.
Excellent communication and presentation skills, capable of influencing technical strategy across the organisation.
Proactive in identifying and driving opportunities for improvement by regularly reviewing delivery and production metrics.
Proven ability to translate ambiguous problems into actionable work and exercise strategic decision-making under pressure during critical incidents.
Senior Software Engineer at PayPal managing cloud infrastructure and DevOps solutions. Delivering complete SDLC solutions and guiding engineering teams for scalable and reliable services.
Senior Site Reliability Engineer at Diligent leading reliability, automation, and observability across cloud infrastructure. Build tools for incident response and enhance performance in fast - paced environments.
Perception Deployment Engineer deploying deep learning models on embedded systems at Caterpillar. Collaborating with cross - functional teams for integration and optimization of perception modules in vehicles.
Principal Site Reliability Engineer at AT&T required to design scalable solutions for critical operations with minimal downtime. Collaborating with teams to monitor and improve system performance in cloud environments.
DevOps Engineer managing AI SaaS infrastructure at a high - growth European company. Supporting AI model deployment and ensuring platform security and compliance with multiple systems integration.
Engineering Manager leading teams for observability platforms at LexisNexis. Owns operational excellence across software delivery lifecycle in Raleigh, NC.
Reliability Engineer optimizing site facility infrastructure and utility systems at Roche. Conducting root cause analyses and developing maintenance plans to enhance reliability and efficiency.
DevOps SME designing, implementing, and operating multi - cloud platforms for The Missing Link. Collaborating with engineering, security, and operations teams while embedding DevOps best practices.
Site Reliability Engineer improving reliability of cloud infrastructure for an AI - specialized company. Taking ownership of monitoring and incident response processes in hybrid - working style.
DevOps Engineer leading automation for sophisticated release/deployment pipelines at Securonix. Focused on Python, Ansible, and cloud services to enhance security operations.