Senior Site Reliability Engineer managing complex distributed systems for DeepL's AI communications platform. Responsibilities include Kubernetes administration, system monitoring, and incident response.
Responsibilities
Design, maintain, and optimize complex distributed systems, ensuring high availability and performance.
Act as the Kubernetes administrator for our team, managing and troubleshooting our environments.
Develop and implement monitoring solutions to ensure system reliability and adherence to defined SLOs.
Engage in on-call rotations, responding to incidents and contributing to post-mortem analyses.
Foster a culture of continuous improvement by identifying and solving problems proactively.
Requirements
Proven experience in maintaining and designing complex systems, with a strong foundation in distributed systems and low-level interactions.
Hands-on expertise in Kubernetes.
Professional experience in Python or Go.
Demonstrated ability to build and implement monitoring solutions.
A self-motivated and adaptable mindset, with a knack for problem-solving across various domains.
Excellent communication skills.
Benefits
Diverse and internationally distributed team: joining our team means becoming part of a large, global community with people of more than 90 nationalities.
Open communication, regular feedback: we value smooth collaboration, direct and actionable feedback, and believe that leading with empathy and growth mindset makes us better together.
Hybrid work, flexible hours: we offer a hybrid work schedule, with team members coming into the office twice a week.
Regular in-person team events: we bond over vibrant events that are as unique as our team.
Monthly full-day hacking sessions: every month, we have Hack Fridays, where you can spend your time diving into a project you're passionate about.
30 days of annual leave: we value your peace of mind.
Virtual Shares: An ownership mindset in every role.
Competitive benefits: we've crafted it to reflect the diversity of our team.
Lead DevOps Engineer modernizing infrastructure and automation for Wells Fargo’s Consumer Technology platforms. Collaborating across teams to build scalable solutions and elevate engineering excellence.
Senior DevOps Engineer re - envisioning enterprise level applications at Ryan. Designing and maintaining cloud infrastructure for optimal service delivery.
Reliability Engineer focusing on risk minimization and maintenance strategies in an innovative PEM electrolyzer company. Collaborating cross - functionally to enhance equipment and systems performance.
Principal Site Reliability Engineer at Red Hat managing the RHIVOS product SRE initiative. Focusing on infrastructure reliability and continuous improvement with deep technical expertise in engineering.
DevOps Azure Developer specializing in end - to - end application development at global healthcare leader Abbott. Engaging in CI/CD processes and building secure cloud applications using Azure and Python.
DevSecOps Engineer at Livingston ensuring security in CI/CD pipelines and building resilient systems. Collaborating with teams to integrate best practices in software development.
Reliability Engineer at LANXESS improving the reliability of fixed and rotating equipment. Partnering with Engineering and Operations to ensure asset safety and performance.
Cloud Engineer at Agility Technologies leading the design of scalable eLearning infrastructure. Collaborating on technical design and implementation involving cloud - based platforms and secure integrations.
Senior Hardware Reliability Engineer overseeing reliability testing and analysis of outdoor electronic assemblies at Gridware. Collaborating with mechanical engineers and contributing to product lifetimes modeling.