Site Reliability Engineer building reliable and scalable infrastructure for fintech startup Pave Bank. Collaborating with internal teams to enhance banking platform performance and reliability.
Responsibilities
Monitor, maintain, and improve the reliability, availability, and performance of production systems and services
Build and maintain infrastructure as code (IaC), deployment pipelines, and automation to support continuous delivery, scalability, and disaster recovery
Respond to incidents, perform root-cause analysis, and drive postmortems to ensure lessons learned are applied
Implement and enforce operational best practices: observability, logging, metrics, alerting, capacity planning, failover strategies, and backups
Collaborate with Engineering, Product, Compliance, and Operations teams to ensure infrastructure meets reliability, compliance, and security standards
Support service scaling, database operations, cloud infrastructure (GCP preferred), networking, and microservices orchestration
Document operational runbooks, on-call procedures, and system architecture to support maintenance, knowledge sharing, and compliance
Requirements
Strong programming or scripting skills (Go, Python, Bash, or similar) for automation, tooling, and operational tasks
Hands-on experience with cloud infrastructure, ideally Google Cloud Platform (GCP)
Familiarity with containerization and orchestration (Docker, Kubernetes, or equivalent)
Experience with infrastructure-as-code tools (Terraform, Cloud Deployment Manager, or similar)
Experience with either FluxCD or ArgoCD for GitOps-based delivery
Solid understanding of distributed systems, microservices architecture, and reliability patterns
Experience setting up monitoring, logging, alerting, and observability (e.g., Prometheus, Grafana, ELK, distributed tracing)
Strong troubleshooting skills and ability to respond to incidents under pressure
Knowledge of backup and disaster recovery strategies, database management, and secure operations
Ownership mindset: proactive, responsible, and committed to system reliability
Strong communication skills — able to coordinate across technical and non-technical stakeholders
Comfortable working in a fast-paced, early-stage startup environment
High integrity, attention to detail, and passion for fintech and programmable banking systems
Prior experience in fintech, banking, or other highly regulated industries is a plus
Benefits
Competitive salary and meaningful equity with room for growth
Work alongside a founding team from Monzo and BigPay, bringing top-tier fintech expertise
Tackle real-world reliability challenges in a regulated, fast-growing fintech environment
Learn from and collaborate with experienced engineers while developing your SRE career
Senior DevOps Engineer at Elliptic shaping DevOps culture and driving automation across engineering teams, providing expertise and leadership across the stack.
Senior Data Reliability Engineer ensuring software reliability and quality across enterprise applications. Collaborating with teams to implement robust on - call processes and maintain data fidelity.
Infrastructure & Cloud Operations Engineer managing AWS and hybrid environments for CV - Library. Hands - on role focused on reliability, automation, and operational excellence.
Lead DevOps Engineer managing DevOps projects for high - quality strategy games at Twin Harbour Interactive. Collaborating with teams to optimize production systems and improve development workflows.
Software Engineer contributing to the observability team's development of visibility systems. Implementing a high - performance telemetry platform and supporting AI tools for engineering teams.
Senior DevOps Platform Engineer at Humana designing secure cloud infrastructure for healthcare technology. Responsible for CI/CD pipelines and compliance in regulated environments.
Site Reliability Engineer working on the post - RPA Agentic Automation Platform for enterprises. Responsible for developing scalable systems and improving operational reliability.
Cloud Operations Engineer handling advanced troubleshooting and system administration for secure cloud environments. Operating compliance controlled cloud environments and maintaining system stability.
Site Reliability Engineer enhancing Dovetail's platform for AI - driven customer insights. Collaborate with cross - functional teams to ensure operational excellence and support customer needs.