Site Reliability Engineer operating on Confluent Cloud for government clients. Ensuring system reliability and compliance with FedRAMP standards in a hybrid working model.
Responsibilities
Understand and participate in the changing FedRAMP space by quickly ramping up with the 20x controls and building upon these to maintain federal compliance
Own and champion high operational standards of Confluent Cloud systems leveraged by federal agencies
Deploy production changes to Confluent Cloud systems and infrastructure through established change management processes
Assist with process improvements and adoption of change management
Own monitoring and incident handling of complex distributed systems, engaging engineering teams when needed through an escort model system.
Act as a core member of Confluents Business Continuity Plan and Disaster Recovery team with efforts across 3 large verticals
Innovate and design solutions to reduce toil, bolster operational maturity, and make day-to-day worklife easier.
Participate in a 24/7 on-call rotation to maintain the integrity of Confluent Cloud for Government systems
Requirements
0-2 years of relevant experience
Experience in Cloud Native technologies with experience operating production services in the cloud
Fundamentals of Distributed Systems and their design
Knowledge of Kubernetes and containerization
Proficiency in infrastructure as code (Terraform preferred)
Experience with telemetry tooling to monitor production systems (DataDog, Grafana, Prometheus)
Exposure and understanding of BCP/DR and high availability exercises
Ability to quickly problem-solve and troubleshoot critical services
Proficiency with scripting and automation (e.g Go, Java, Python, Bash)
Exceptional teamwork, collaboration skills, and the ability to act critically with minimal supervision at times in a remote first environment
Experience with a rotating on-call schedule to provide 24/7 support
BS Degree in Computer Science, Engineering, or equivalent experience
Site Reliability Engineer focusing on AWS cloud environments, SRE practices, and system reliability within GFT's team. Collaborating on cloud migrations and observability initiatives.
Senior DevOps Analyst enhancing infrastructure automation in a transformative technology firm. Collaborating on innovative projects in sectors like healthcare, finance, and utilities in Brazil.
Consultant at Minsait supporting technical decisions in infrastructure automation and developing solutions. Collaborating with teams for maintaining and evolving automation platforms.
Practical Trainee focusing on hardware reliability engineering at Sonova. Support reliability improvement initiatives and work closely with experienced engineers on real - life product challenges.
Configuration Management Engineering Technician supporting naval shipbuilding projects with engineering documentation and configuration integrity. Establishing and maintaining relationships with stakeholders in the shipbuilding community.
Principal Configuration Management Engineering Technician contributing to major shipbuilding programs for national security. Leading Configuration Management teams and ensuring data integrity for advanced naval vessels.
Senior Configuration Management Engineering Technician at Babcock supporting naval engineering programmes across multiple ship configurations. Influencing critical decisions and contributing to engineering outcomes for national defence.
DevOps Engineer designing and managing scalable Azure cloud infrastructure for a financial technology company. Collaborating with teams to enhance system reliability and automate application delivery pipelines.
DevOps Engineer responsible for designing and managing Azure cloud infrastructure for a financial services provider. Collaborating with development teams to optimize system reliability and security.
Senior DevOps Engineer responsible for scaling and securing infrastructure behind healthcare AI platform. Collaborating with teams to deliver integrations and drive automation best practices.