Site Reliability Engineer at Plenful maintaining system performance and reliability. Collaborating with teams to improve operations and ensure system stability in a fast-paced environment.
Responsibilities
Maintain and evolve alerting so engineers receive clear, actionable signals for anomalies, latency regressions and reliability risks
Define observability standards across metrics, logs and tracing with a focus on reliability, performance and customer impact instead of vanity data
Investigate performance bottlenecks across our distributed systems including serverless task execution, containerized services, workflow orchestration and Postgres
Lead incident response, coordinate root cause analysis and ensure reliability improvements are fully implemented and measured
Improve the reliability of our distributed task processing, including autoscaling behavior, execution patterns, retry logic, rate limiting and failure isolation
Support the stability of our serverless pipelines that process high volume workloads across multiple execution layers
Partner with backend and ML teams on designing resilient mechanisms for scheduling, queueing and workflow execution
Maintain efficient and predictable resource usage across compute, networking and storage
Support security and compliance work including patching, audit readiness and vulnerability management
Participate in the on-call rotation and respond to production incidents quickly and calmly with a focus on restoring stable service and clear communication
Contribute to blameless postmortems, drive follow through on fixes and ensure learnings are documented for future engineers
Requirements
5+ years of professional engineering experience in a B2B, SaaS company
Strong experience operating production systems in cloud environments, ideally AWS
Hands-on experience with serverless compute patterns, containerized services, distributed workflows and Postgres
Solid understanding of observability tooling, performance debugging and system behavior under load
A high ownership mindset, empathy for teammates, straightforward communication and a one team attitude
Comfortable working in a fast paced startup environment with a bias for action and thoughtful engineering judgment
Benefits
Enjoy unlimited PTO
Fully covered health insurance (medical, dental, and vision)
Mainframe DevOps role focusing on data management and service delivery for Commerzbank. Join a customer - centric team dedicated to a data - driven enterprise.
Senior DevOps Engineer working on CI/CD setup, deployment security, and database maintenance for Bundesdruckerei GmbH. Collaborating on innovative secure digital solutions in Berlin.
Site Reliability Engineer operating on Confluent Cloud for government clients. Ensuring system reliability and compliance with FedRAMP standards in a hybrid working model.
Senior Site Reliability Engineer at LexisNexis working on cloud data applications and microservices. Collaborating within teams to enhance system reliability and automate recovery processes.
Reliability & Maintenance Engineer for Reckitt focusing on maintenance strategies and equipment optimization. Involves collaboration across production, quality, and maintenance teams to minimize downtime and extend asset life.
Associate SRE ensuring high availability and minimal disruption across business - critical systems through monitoring and automation. Collaborating with teams to boost workflow efficiency in a sustainable energy company.
DevOps Engineer transforming infrastructure to support GovTech solutions. Collaborating with development and test teams on projects, focusing on Infrastructure as Code and CI/CD pipelines.
Principal DevOps Engineer at KingMakers focusing on coding and infrastructure within product squads. Leading technical improvements in observability, reliability, and performance across platforms.
DevOps Consultant at Opencast focused on building scalable systems for high - impact projects. Requires SC Clearance and involves collaboration with clients.