Staff Reliability Engineer at insurance company enhancing stability and performance of systems. Collaborating across teams to implement best practices and mentor others in reliability engineering.
Responsibilities
Lead the design, implementation, and optimization of reliable systems and infrastructure.
Collaborate with software engineering, operations, and product teams to ensure uptime and availability targets are met.
Develop and maintain monitoring, alerting, and incident response strategies to detect and resolve issues quickly.
Conduct root cause analysis of system failures and drive corrective actions to prevent recurrence.
Advocate for reliability best practices and foster a culture of proactive risk mitigation across the organization.
Mentor and provide technical guidance to other reliability engineers and cross-functional team members.
Develop automation tools to enhance efficiency in deployment, monitoring, and recovery processes.
Participate in capacity planning, performance testing, and disaster recovery exercises.
Stay current with industry trends, emerging technologies, and best practices in reliability engineering.
Requirements
5+ years of experience in reliability engineering, site reliability engineering (SRE), or related roles.
Expertise in cloud platforms (e.g., AWS, Azure, Google Cloud) and container orchestration (e.g., Kubernetes).
Strong programming skills in one or more languages (e.g., Python, Java).
Proven experience with logging and monitoring tools (e.g., Splunk, Dynatrace, Datadog) and incident management frameworks (e.g. ServiceNow).
Excellent analytical, troubleshooting, and communication skills.
Ability to lead complex projects and influence stakeholders at all levels.
Reliability Engineer providing support to maintenance and operations teams for critical gold processing assets. Ensuring equipment reliability and leading improvement initiatives at Gruyere Gold Mine.
Site Reliability Engineer responsible for monitoring and improving production systems at ING. Leading teams to ensure high reliability and performance of business - critical applications.
Reliability Engineer at Mosaic Company providing in - depth analysis on mechanical systems to reduce risk. Supporting operations in reliability improvement initiatives across refinery and minefield.
DevOps Engineer at MYOB enhancing core business management systems for small to medium enterprises in Australia and New Zealand. Focused on operational excellence and stability.
(Senior) DevOps Engineer automating IT processes and managing CI/CD for digital solutions in a technology company. Collaborating with product owners and engineering teams to ensure secure digital solutions.
Azure Cloud Operations Engineer optimizing the cloud infrastructure in Vienna for innovative work management software. Collaborate on cloud solutions with a dynamic international team.
Join CI&T as a DevOps Master in technology transformation involving a corporate developer platform. Collaborate closely with teams to enhance scalability and operational efficiency.
DevOps Engineer responsible for developing and operating CI/CD pipelines in hybrid environments. Join K - tronik to work on innovative software and hardware projects within a dedicated team.
Senior SRE managing reliability of 300+ servers powering client Odoo ERP systems. Lead incident response and guide a team in building reliable systems.