Site Reliability Engineer operating on Confluent Cloud for government clients. Ensuring system reliability and compliance with FedRAMP standards in a hybrid working model.
Responsibilities
Understand and participate in the changing FedRAMP space by quickly ramping up with the 20x controls and building upon these to maintain federal compliance
Own and champion high operational standards of Confluent Cloud systems leveraged by federal agencies
Deploy production changes to Confluent Cloud systems and infrastructure through established change management processes
Assist with process improvements and adoption of change management
Own monitoring and incident handling of complex distributed systems, engaging engineering teams when needed through an escort model system.
Act as a core member of Confluents Business Continuity Plan and Disaster Recovery team with efforts across 3 large verticals
Innovate and design solutions to reduce toil, bolster operational maturity, and make day-to-day worklife easier.
Participate in a 24/7 on-call rotation to maintain the integrity of Confluent Cloud for Government systems
Requirements
0-2 years of relevant experience
Experience in Cloud Native technologies with experience operating production services in the cloud
Fundamentals of Distributed Systems and their design
Knowledge of Kubernetes and containerization
Proficiency in infrastructure as code (Terraform preferred)
Experience with telemetry tooling to monitor production systems (DataDog, Grafana, Prometheus)
Exposure and understanding of BCP/DR and high availability exercises
Ability to quickly problem-solve and troubleshoot critical services
Proficiency with scripting and automation (e.g Go, Java, Python, Bash)
Exceptional teamwork, collaboration skills, and the ability to act critically with minimal supervision at times in a remote first environment
Experience with a rotating on-call schedule to provide 24/7 support
BS Degree in Computer Science, Engineering, or equivalent experience
Mainframe DevOps role focusing on data management and service delivery for Commerzbank. Join a customer - centric team dedicated to a data - driven enterprise.
Senior DevOps Engineer working on CI/CD setup, deployment security, and database maintenance for Bundesdruckerei GmbH. Collaborating on innovative secure digital solutions in Berlin.
Site Reliability Engineer at Plenful maintaining system performance and reliability. Collaborating with teams to improve operations and ensure system stability in a fast - paced environment.
Senior Site Reliability Engineer at LexisNexis working on cloud data applications and microservices. Collaborating within teams to enhance system reliability and automate recovery processes.
Reliability & Maintenance Engineer for Reckitt focusing on maintenance strategies and equipment optimization. Involves collaboration across production, quality, and maintenance teams to minimize downtime and extend asset life.
Associate SRE ensuring high availability and minimal disruption across business - critical systems through monitoring and automation. Collaborating with teams to boost workflow efficiency in a sustainable energy company.
DevOps Engineer transforming infrastructure to support GovTech solutions. Collaborating with development and test teams on projects, focusing on Infrastructure as Code and CI/CD pipelines.
Principal DevOps Engineer at KingMakers focusing on coding and infrastructure within product squads. Leading technical improvements in observability, reliability, and performance across platforms.
DevOps Consultant at Opencast focused on building scalable systems for high - impact projects. Requires SC Clearance and involves collaboration with clients.