Site Reliability Engineer at Coinbase optimizing cloud deployments and enhancing system reliability. Working with engineering teams to improve software reliability and performance across the organization.
Responsibilities
Improve observability, reliability and availability by defining and measuring key metrics.
Build automation and improve systems to eliminate toil and operations work.
Collaborate with our core infrastructure team to performance tune and optimize our cloud deployments. (Think Docker, Terraform, Kubernetes, EC2, etc.)
Collaborate with Coinbase product teams to reduce service disruptions and automate incident response.
Proactively find and analyze reliability problems across our business units and stack, then design and implement software to create step-function improvements.
Educate, mentor and hold accountable the engineering team to improve the reliability of our systems and make reliability a core value of the Coinbase engineering culture.
Write high quality, well tested code to meet the needs of your customers.
Debugging extremely difficult technical problems, and making systems and products both work better and are easier to deploy, own, operate and diagnose.
Review all feature designs within your product area and across the company for cross-cutting projects.
Be an owner of the security, safety, scale, operational integrity, and architectural clarity of these designs.
Build pipelines to integrate with 3rd party vendors.
Participate in an on-call support rotation to provide timely troubleshooting and resolution of urgent issues.
Requirements
You have at least 6+ years of experience in software engineering.
You’ve designed, built, scaled and maintained production services, and know how to compose a service oriented architecture.
You write high quality, well tested code to meet the needs of your customers.
You’re passionate about building an open financial system that brings the world together.
You possess strong technical skills for system design and coding.
Excellent written and verbal communication skills, and a bias toward open, transparent cultural practices.
Strong skills around observability, debugging and performance tuning.
Strong communication skills and ability to explain technical concepts clearly and simply.
Strong interpersonal skills working with Engineers from junior to principal levels.
Demonstrated critical thinking under pressure.
A willingness to dive into understanding, debugging, and improving any layer of the stack.
This role requires on-call availability to ensure swift resolution of issues outside regular business hours.
DevOps Engineer building and maintaining authentication platforms in multi - cloud environments. Using technologies like Terraform, Ansible, and Python for automation and optimization.
Cloud Engineer developing Infrastructure - as - Code with Terraform and Azure DevOps. Managing Azure infrastructure and leading incident response within cross - functional teams.
DevSecOps Engineer at Skillfield working on secure CI/CD pipelines for mobile - first delivery. Collaborating with teams to embed security and automation in the delivery lifecycle.
Lead DevOps Engineer focused on AWS and Azure data platform solutions. Collaborating with teams to deliver scalable, secure, and highly available solutions.
DevOps Engineer working at GRÜN Software Group to automate and maintain stable infrastructures. Collaborating with teams to improve deployments and processes for better performance.
Linux System Administrator managing IT infrastructures for educational institutions and research. Collaborating on DevOps and HPC projects while ensuring system security and performance.
Azure SRE Engineer responsible for designing and maintaining secure, scalable Azure cloud infrastructure. Driving automation and operational excellence for leading organizations in technology transformation.
Senior Manager of Site Reliability Engineering overseeing Workday Kubernetes based platform. Leading teams while ensuring high availability and collaborating with federal agencies.
Site Reliability Engineer focusing on AWS cloud environments, SRE practices, and system reliability within GFT's team. Collaborating on cloud migrations and observability initiatives.