Platform Engineer at Attio building and enhancing internal technology platform. Focused on DevOps principles and operational excellence in a hybrid work environment.
Responsibilities
Implement, maintain, and continuously improve the foundational platform infrastructure that powers all engineering services.
Build and maintain platform infrastructure using declarative IaC tools (e.g., Terraform, Pulumi), ensuring all environments are reproducible, version-controlled, and auditable.
Act as first-line responders for critical system incidents. Triage, diagnose, and resolve complex production issues rapidly.
Drive a culture of blameless post-mortems, ensuring root causes are identified, and long-term preventative measures are implemented as code (e.g., via runbooks, automation, or system design changes).
Own the stack of supporting tools necessary for operational excellence and developer enablement, including implementing, maintaining, and evolving the fully automated CI and CD pipelines.
Implement and manage robust systems for monitoring (metrics), logging (centralised log aggregation), and distributed tracing to provide deep insights into application and infrastructure health.
Requirements
Must have: Demonstrable, hands-on experience applying core DevOps and Site Reliability Engineering (SRE) principles to manage, monitor, and scale production systems.
Must have: A deep understanding of the SRE mindset, including SLO/SLA creation and monitoring, error budget management, toil reduction, and post-incident review (blameless postmortems).
Desirable: Proven ability to drive cultural and process change that fosters a collaborative approach between development and operations teams.
Must have: Expertise in one or more major public cloud providers (AWS, GCP, or Azure), encompassing network configuration, security best practices (IAM, security groups, etc.), compute services (EC2, GKE, ECS, etc.), and managed services (databases, queues, serverless functions).
Must have: In-depth knowledge of container technologies, specifically Docker, and extensive experience orchestrating them at scale using Kubernetes (K8s). This includes designing, deploying, and managing Kubernetes clusters, understanding networking (CNI), storage (CSI), and security configurations within the Kubernetes ecosystem.
Must have: Proficiency in one or more modern software languages (e.g., Typescript, Go, Python, Rust) and associated frameworks used for building high-performance, resilient production systems.
Must have: Proven experience developing robust, maintainable, and well-tested automation scripts, services and pipelines to manage infrastructure, deployments, and operational tasks.
Must have: Experience owning, managing, and maintaining mission-critical operational tooling.
Desirable: Proven background in implementing and managing centralised logging solutions or similar platforms (e.g., Splunk, DataDog).
Desirable: Familiarity with distributed tracing tools (e.g., Jaeger, Zipkin) and Application Performance Monitoring (APM) solutions.
Benefits
Competitive salary of £80,000 to £95,000
Equity in an early-stage tech company on an incredible trajectory
25 days holiday plus local public holidays
Apple hardware
Private medical insurance through AXA
Pension contribution through Hargreaves Lansdown
Enhanced family leave
Team off-site in fun places! (We've been to Barcelona, Lisbon, Malta, and Split so far)
Lead Platform Engineer enhancing Humana's advanced healthcare solutions. Overseeing enterprise platform services and driving modernization initiatives across teams and systems.
Senior Platform Engineer contributing to scalable and resilient healthcare technology and AI solutions at Humana. Focused on cloud infrastructure modernization and automation best practices for operational excellence.
Network Automation Platform Support Engineer focused on supporting and maintaining automation and data platforms at Fiserv. Involves collaboration with engineering teams for improved processes and solutions.
Senior AI Platform Engineer designing and implementing AI infrastructures at leading financial services company. Utilizing big data platforms and mentoring engineers in AI best practices.
Senior AI Product Platform Engineer at Kulu, an AI startup building onboarding agents. Responsible for product platform ownership and release - quality systems.
Intern assisting in modernization initiatives for agentic AI workflows and data platforms. Supporting the development and maintenance of data pipelines and prototyping AI use cases.
Senior Research and Development Engineer for transformer mechanical design at Hitachi Energy. Leading software development for innovative projects and collaborating within a global team.
Platform Engineer leading lifecycle management of MOM and AMHS systems across Kubernetes clusters in semiconductor industry. Collaborating with internal teams to ensure operational reliability in manufacturing.
Own product platform and release - quality systems for AI SaaS startup. Implement analytics, build dashboards, and ensure safe releases while maintaining high quality standards.
Principal Cloud Security Design Engineer defining and engineering cloud security architecture. Leading technical initiatives in Azure and AWS environments for financial services company.