SRE leading reliability and operational excellence at a mortgage tech platform. Designing systems, tooling, and processes for managing Pylon's production systems in Palo Alto.
Responsibilities
You'll own reliability and operational excellence for Pylon's production systems.
Designing and implementing monitoring, alerting, and incident response processes that scale as we grow.
Building tooling that makes the entire engineering team more effective.
Establish on-call rotations and runbooks.
Ensure our platform can handle the demands of a regulated, high-stakes financial product.
Spend 50%+ of your time writing code: building infrastructure tooling, automating operational burden, making reliability improvements, and productivity tools.
Requirements
4+ years experience in SRE, infrastructure, or platform engineering roles
Experience working on a team of SREs at a company with mature SRE practices (not solo SRE roles)
Real on-call experience at scale in a large production environment (you've carried the pager and lived through incidents)
Deep AWS expertise (ECS, RDS, networking, security)
Strong experience with declarative infrastructure (Terraform, CDK, or similar)
Nix experience (we use it and want to expand its adoption)
Track record of building reliability tooling and automation
Can design and implement monitoring, alerting, and observability systems from first principles
Comfortable working in a regulated environment where "breaking things" is not an option.
DevOps Engineer designing, implementing CI/CD pipelines and supporting cloud - based solutions at eInfochips. Collaborating with QA and Engineering teams for release readiness.
DevOps Engineer III providing L3 support for Operations across Edge/on - prem and cloud environments. Building automations and handling incidents for customer deployments.
Senior Build & Release Engineer at GXO Logistics responsible for CI/CD solutions and build automation across various environments. Collaborating with teams for smooth software deployments and mentoring staff.
Senior Site Reliability Engineer improving the reliability of Acuity’s cloud services. Collaborating across teams to define observability standards and incident response in Cork Digital Centre of Excellence.
Azure Senior DevOps Engineer supporting critical cloud systems in the Azure Government Cloud environment. Leading CI/CD pipeline design and implementation with operational best practices.
Automation Engineer enhancing infrastructure and automating operations for client systems. Working in a complex environment oriented towards automation, security, and performance.
Graduate Reliability Engineer at GKN Aerospace enhancing operational excellence through data analysis and project participation within large structural assemblies.
Site Reliability Engineer at WRITER, ensuring 24/7 availability and performance of AI - powered workflows. Collaborating on scalable infrastructure solutions while impacting enterprise customer trust.
Engineer at Trading Technologies improving platform stability through coding and automation. Focus on building advanced monitoring tools for global trading operations.