Site Reliability Engineer at Plenful maintaining system performance and reliability. Collaborating with teams to improve operations and ensure system stability in a fast-paced environment.
Responsibilities
Maintain and evolve alerting so engineers receive clear, actionable signals for anomalies, latency regressions and reliability risks
Define observability standards across metrics, logs and tracing with a focus on reliability, performance and customer impact instead of vanity data
Investigate performance bottlenecks across our distributed systems including serverless task execution, containerized services, workflow orchestration and Postgres
Lead incident response, coordinate root cause analysis and ensure reliability improvements are fully implemented and measured
Improve the reliability of our distributed task processing, including autoscaling behavior, execution patterns, retry logic, rate limiting and failure isolation
Support the stability of our serverless pipelines that process high volume workloads across multiple execution layers
Partner with backend and ML teams on designing resilient mechanisms for scheduling, queueing and workflow execution
Maintain efficient and predictable resource usage across compute, networking and storage
Support security and compliance work including patching, audit readiness and vulnerability management
Participate in the on-call rotation and respond to production incidents quickly and calmly with a focus on restoring stable service and clear communication
Contribute to blameless postmortems, drive follow through on fixes and ensure learnings are documented for future engineers
Requirements
5+ years of professional engineering experience in a B2B, SaaS company
Strong experience operating production systems in cloud environments, ideally AWS
Hands-on experience with serverless compute patterns, containerized services, distributed workflows and Postgres
Solid understanding of observability tooling, performance debugging and system behavior under load
A high ownership mindset, empathy for teammates, straightforward communication and a one team attitude
Comfortable working in a fast paced startup environment with a bias for action and thoughtful engineering judgment
Benefits
Enjoy unlimited PTO
Fully covered health insurance (medical, dental, and vision)
DevOps Lead at Leidos managing platform engineering, SRE, and application security functions. Driving operational excellence and ensuring scalability for federal government applications.
SRE Lead developing scalable cloud - native solutions for mission - critical systems supporting USAF. Managing teams, collaborating with cross - functional units, and ensuring high service reliability standards.
Junior DevOps / Platform Engineer at DieEnergiekoppler GmbH managing AWS/EKS platform operations. Collaborating with team members to improve platform functionalities and security compliance.
DevOps Engineer responsible for AWS infrastructures and backend development at Allguth GmbH. Engaging in greenfield projects with modern solutions in a collaborative team.
Cloud DevOps Specialist responsible for building scalable infrastructure solutions in AWS at SONDA. Focusing on automation, containerization, and data management in a collaborative environment.
DevOps Engineer maintaining and evolving deployment pipelines for Docebo’s AI - powered learning platform. Collaborating with cross - functional teams to ensure efficient software releases and infrastructure management.
DevOps Engineer optimizing CI/CD pipelines for Docebo, an AI - powered learning platform. Involves managing multi - tenant infrastructure using AWS, Docker, and Kubernetes.
DevOps Engineer maintaining and automating infrastructure and CI/CD processes for cybersecurity solutions by NordLayer. Collaborating with teams to ensure performance and scalability of cloud services.
DevOps Engineer maintaining and improving infrastructure and CI/CD processes for cybersecurity solutions provider. Collaborating with cross - functional teams for reliable and scalable cloud solutions.
DevOps Engineer maintaining and automating infrastructure and CI/CD processes at NordLayer. Collaborating with Senior Engineers to implement best practices in a dynamic cybersecurity environment.