Software Engineer, SRE responsible for maintaining systems and increasing infrastructure reliability at Mercari. Collaborating across teams to deliver high-impact features with a focus on efficiency and support.
Responsibilities
Operate and Maintain shared components used by multiple teams in Mercari US, impacting overall production reliability
Define and Measure System Reliability Goals using SLO/SLIs
Continuously monitor capacity, performance, and cost of systems in both production and development
Build, run, and integrate software to improve the availability, scalability, latency, and efficiency of our system as a whole
Define, manage and run Incident Management Processes (on-call, incident response, postmortem)
Mentor junior engineers, lead code reviews, and actively contribute to architectural decisions and technical documentation.
Collaborate with cross-functional teams including product, engineering, and QA to deliver high-impact features and improvements.
Requirements
5+ years of experience working with and administrating
production DBMS/MySQL clusters
Cloud Native environments (GCP/AWS/Azure)
Docker and Kubernetes
5+ years of professional experience maintaining and operating infrastructure
Bachelor’s degree in Computer Science, Software Engineering, or a related field (or equivalent practical experience).
Experience and Passion for optimizing performance of databases/networking/microservices
Strong programming expertise in any programming language
Excellent English communication skills, with the ability to collaborate effectively across functions and regions.
Demonstrated ability to mentor and guide junior engineers.
Reliability Engineer focusing on mechanical systems in a long - standing Australian FMCG company. Ensure ongoing reliability improvements and support plant operations for iconic cereal production.
Software Engineer 2 developing full - stack solutions for U.S. Bank. Collaborating with teams to design and maintain best in class software experiences.
Principal Software Engineer at FIS driving reliability and performance in fintech environments. Collaborating across teams for high - scale, high - reliability solutions in the finance sector.
Senior Software Development Engineer involved in automation testing at CVS Health. Designing, developing, and implementing automated testing solutions in a collaborative environment.
Senior Site Reliability Engineer for observability platforms at Dimensional, ensuring reliability and scaling the infrastructure. Collaborating with teams on operations and engineering projects.
Senior Site Reliability Engineer focusing on reliability and operational excellence of workflow orchestration platforms like Apache Airflow. Engaging in operations and engineering projects in a hybrid setup.
Senior Staff Reliability Engineer for the humanoid robotics team ensuring performance and safety standards. Leading reliability engineering initiatives and mentoring within the engineering team.
Reliability Engineer at Air Liquide optimizing maintenance strategies, ensuring equipment uptime across multiple sites in the United States. Collaborating with teams for continuous improvement and operational excellence.
Senior Azure Engineer at Capgemini responsible for building, operating, and optimizing cloud - native platforms. Collaborating with teams to ensure reliability, performance, and security for critical workloads.
DevOps Engineer specialized in Cloud environments at Avanquest, planning and migrating services to the Cloud and implementing microservice architectures.