Site Reliability Engineer improving system reliability and performance in production environments with a focus on automation and operational efficiency. Collaborating with engineering and infrastructure teams on deliverable-focused projects.
Responsibilities
Design, develop, test, and deploy automation tools, scripts, and engineering solutions to improve the stability, performance, and efficiency of production systems.
Identify opportunities to automate manual operational processes and reduce operational overhead.
Support and improve the release and deployment lifecycle of applications, ensuring reliable and controlled production rollouts.
Collaborate with software engineers and infrastructure teams to troubleshoot and resolve system issues.
Contribute to system design discussions, platform management, and capacity planning.
Create and maintain clear technical documentation for automation tools, operational procedures, and reliability improvements.
Provide regular updates on progress and deliverables to engineering stakeholders.
Requirements
At least 1 year of professional software development or reliability engineering experience
Proficiency in one or more programming languages such as Python, C++, Java, or shell scripting
Strong understanding of Linux operating system internals
Solid knowledge of networking concepts and troubleshooting
Experience with modern version control systems such as Git
Familiarity with monitoring, logging, and CI/CD tools (e.g., Prometheus, Grafana, Splunk, Jenkins, GitLab CI) is highly beneficial.
Ability to work independently, manage priorities effectively, and deliver results with minimal supervision.
Excellent written and verbal communication skills, with the ability to clearly communicate technical topics to engineering stakeholders.
Ability to quickly learn new technologies and tools and work across multiple programming languages and frameworks.
Mechanical/Reliability Engineer responsible for mechanical installations in Bergen op Zoom. Analyzing maintenance strategies and leading projects to enhance reliability.
Senior DevOps Engineer responsible for cloud infrastructure and deployments. Optimizing AWS services and ensuring system security and reliability for Verizon.
Senior DevOps Engineer responsible for automating infrastructure and building CI/CD pipelines for collaborative robotics company. Collaborating with global engineering teams from the Bangalore office.
Site Reliability Engineer Intern at Tencent working on gaming services and cloud native solutions. Collaborating with global teams to eliminate toil and enhance reliability.
Cloud/DevOps Specialist at N5X managing and optimizing critical cloud infrastructures for Brazilian energy trading. Collaborating with a multidisciplinary team to ensure high availability and performance.
Cloud/Devops Specialist responsible for designing a hybrid architecture combining cloud and on - premises infrastructure for energy trading systems. Collaborating with a multidisciplinary team in a dynamic environment.
Reliability Engineering Specialist utilizing reliability tools and models to improve asset performance at Enbridge. Collaborating across teams to guide investment decisions for safe operations.
DevOps Engineer responsible for structuring and supporting cloud DevOps architecture in Brazil. Working strategically on automation and CI/CD practices with development teams in Pernambuco.
DevSecOps Software Engineer developing secure CI/CD pipelines for Boeing's military software systems. Collaborate with cross - functional teams and implement automation and security best practices.
DevOps Manager responsible for managing a team for multi - cloud solutions supporting the USAF Cloud One project. Focus on scalable cloud - native solutions and CI/CD practices.