Site Reliability Engineer at Coinbase optimizing cloud deployments and enhancing system reliability. Working with engineering teams to improve software reliability and performance across the organization.
Responsibilities
Improve observability, reliability and availability by defining and measuring key metrics.
Build automation and improve systems to eliminate toil and operations work.
Collaborate with our core infrastructure team to performance tune and optimize our cloud deployments. (Think Docker, Terraform, Kubernetes, EC2, etc.)
Collaborate with Coinbase product teams to reduce service disruptions and automate incident response.
Proactively find and analyze reliability problems across our business units and stack, then design and implement software to create step-function improvements.
Educate, mentor and hold accountable the engineering team to improve the reliability of our systems and make reliability a core value of the Coinbase engineering culture.
Write high quality, well tested code to meet the needs of your customers.
Debugging extremely difficult technical problems, and making systems and products both work better and are easier to deploy, own, operate and diagnose.
Review all feature designs within your product area and across the company for cross-cutting projects.
Be an owner of the security, safety, scale, operational integrity, and architectural clarity of these designs.
Build pipelines to integrate with 3rd party vendors.
Participate in an on-call support rotation to provide timely troubleshooting and resolution of urgent issues.
Requirements
You have at least 6+ years of experience in software engineering.
You’ve designed, built, scaled and maintained production services, and know how to compose a service oriented architecture.
You write high quality, well tested code to meet the needs of your customers.
You’re passionate about building an open financial system that brings the world together.
You possess strong technical skills for system design and coding.
Excellent written and verbal communication skills, and a bias toward open, transparent cultural practices.
Strong skills around observability, debugging and performance tuning.
Strong communication skills and ability to explain technical concepts clearly and simply.
Strong interpersonal skills working with Engineers from junior to principal levels.
Demonstrated critical thinking under pressure.
A willingness to dive into understanding, debugging, and improving any layer of the stack.
This role requires on-call availability to ensure swift resolution of issues outside regular business hours.
Senior Site Reliability Engineer designing and implementing high - reliability platforms for Broadridge. Collaborating with teams across hybrid environments and driving automation and efficiency in service delivery.
Senior Engineering Manager for Hybrid Services & Reliability within AV Core Infrastructure at GM. Leading a team for the measurable availability of hybrid cloud systems for autonomous vehicle development.
Staff Engineer for GM's Hybrid Services & Reliability team. Driving reliability architecture and maintenance for hybrid cloud services with a focus on SRE principles.
Reliability Engineer for PGD Wind Reliability team at NextEra Energy. Collaborating on optimizing wind turbine performance, increasing reliability, and reducing costs while managing complex technical issues.
Maintenance Reliability Engineer focusing on operational excellence at JLL. Driving reliability through advanced maintenance strategies and technologies in building systems.
Senior Data Platform DevOps Engineer for Expleo focusing on AWS infrastructure solutions. Responsibilities include designing, implementing, and maintaining data platform solutions with a collaborative team.
DevOps Process Engineer at E.ON Digital Technology enhancing corporate service management processes for energy digital transformation. Engaging in compliance, reporting, and automation in a dynamic tech environment.
Site Reliability Engineer focused on application infrastructure, reliability, and scalability. Working at Early Warning, a leader in financial technology solutions for secure transactions.
Lead DevSecOps Engineer at Truist responsible for designing scalable infrastructure and implementing DevSecOps solutions. Oversee automation practices and mentor junior team members.
Cloud Deployment Engineer for PwC focused on designing and implementing cloud solutions for clients. Collaborate with teams to enhance technology infrastructure and business performance.