Cloud Site Reliability Engineer ensuring scalability, performance, and reliability of cloud infrastructure deployed in Woven City. Working with product owners and teams for innovative solutions.
Responsibilities
You will set reliability targets and error budgets, define and measure SLIs, and drive continuous improvement of SLOs
You will participate in on-call rotations, triaging incidents and providing emergency responses
You will define org-wide incident management processes, provide education on incident response, and track and improve on metrics such as MTTD and MTTR
You will analyze critical user journeys and dependencies, provide architectural consultations, and drive operational best practices including documentation and runbooks
You will perform cost engineering, identify optimization opportunities, and own capacity planning
You will develop solutions for change management, monitoring, and disaster recovery
Requirements
Bachelor's degree or equivalent industry experience in Computer Science, Electrical Engineering, or related fields
5+ years of SRE and/or DevOps experience on cloud based systems
Experience in large-scale production systems with major cloud technologies, such as AWS, Istio, K8s, and Grafana
Experience with Terraform and Infrastructure as Code
Experience with multiple modern programming languages, such as Go, Rust, Python
A track record of both independent and collaborative impact
Business level communication skills in verbal and written English
Experience with leading organization wide SRE initiatives
Professional experience with modern Continuous Integration/Continuous Delivery (CI/CD) tooling
Professional Experience or familiarity with Bazel
Master's degree in Computer Science or related fields
10+ years of SRE and/or DevOps experience on cloud based systems
Japanese language skills
Benefits
Competitive Salary - Based on experience
Work Hours - Flexible working time
Paid Holiday - 20 days per year (prorated)
Sick Leave - 6 days per year (prorated)
Holiday - Sat & Sun, Japanese National Holidays, and other days defined by our company
Japanese Social Insurance - Health Insurance, Pension, Workers’ Comp, and Unemployment Insurance, Long-term care insurance
Housing Allowance
Retirement Benefits
Rental Cars Support
In-house Training Program (software study/language study)
Application Security Manager at Evertec, handling security strategy and implementation in financial tech. Leading efforts in Application Security, DevSecOps, and compliance with financial regulations.
Databricks Senior DevOps Engineer designing and operating platforms on AWS and Databricks for Financial Crime. Focused on platform infrastructure, governance, security, and operations.
Site Reliability Engineer at Assecor, focusing on SLIs, SLOs, and incident management. Enhancing performance and reliability through observability and automation in a hybrid work environment.
DevOps Architect at Ascensus, responsible for technical direction and oversight for application engineering practices across scrum teams. Promotes DevOps culture and innovative solutions.
Senior DevOps Engineer supporting enterprise - grade Kubernetes infrastructure and CI/CD automation for U.S. Army projects. Engaging in critical system designs and automation processes with a focus on cloud - based platforms.
Reliability Engineer focusing on mechanical systems in a long - standing Australian FMCG company. Ensure ongoing reliability improvements and support plant operations for iconic cereal production.
Software Engineer 2 developing full - stack solutions for U.S. Bank. Collaborating with teams to design and maintain best in class software experiences.
Principal Software Engineer at FIS driving reliability and performance in fintech environments. Collaborating across teams for high - scale, high - reliability solutions in the finance sector.
Senior Software Development Engineer involved in automation testing at CVS Health. Designing, developing, and implementing automated testing solutions in a collaborative environment.
Senior Site Reliability Engineer focusing on reliability and operational excellence of workflow orchestration platforms like Apache Airflow. Engaging in operations and engineering projects in a hybrid setup.