Site Reliability Engineer focused on building resilient systems and ensuring uptime at MealSuite. Involved in troubleshooting, platform reliability, and enhancing deployment automation.
Responsibilities
Provide dependable on-call support and incident response.
Troubleshoot and resolve infrastructure, performance, networking, and production issues.
Improve operational readiness, runbooks, and system resilience.
Implement and maintain deployment automation and IaC tooling.
Strengthen observability across servers, networks, and applications.
Improve monitoring, alerting, and performance tracking.
Partner with other Platform Engineering team members and related teams to improve systems.
Contribute to design reviews, process improvements, and runbooks.
Document solutions and share knowledge to uplift the team.
Requirements
Deep experience in managing cloud-based service deployments for resilience, performance, and configuration
Strong knowledge and experience writing deployment automation using modern IaC technologies and techniques
Observability expertise that spans the stack; HW, OS, Networking, and APM
Strong competence engineering database and messaging technology platforms
Linux configuration and optimization experience
Strong networking knowledge and experience, physical config, security, performance
VMware Infrastructure Management and performance tuning
Certification in AWS
Bachelor or Associate’s degree
Comfortable supporting production environments and participating in on-call rotation
Strong technical capability with a proactive mindset, naturally curious, and driven to continuously enhance systems and processes.
Benefits
Stock Options - Share in the success you help build.
Employment under Vietnam Labor Law - Includes Vietnam labor law contract with full benefits.
Hybrid Work Flexibility - Work remotely one day per week.
Generous Time Off - up to 20 days of annual leave to recharge and refresh.
100% Salary During Probation - No drop in pay while you ramp up.
Continuous Learning - Unlimited access to online courses through Udemy to support your growth.
Office Perks - Free snacks and beverages to keep you fueled.
Flexible Working Time - We care about outcomes, over rigid schedules.
Team Events - Regular activities to connect, celebrate, and have fun.
Meaningful Work - Build awesome products that make a real impact.
DevOps Manager responsible for managing a team for multi - cloud solutions supporting the USAF Cloud One project. Focus on scalable cloud - native solutions and CI/CD practices.
Lead Site Reliability Engineer overseeing SRE practices across Azure and GCP platforms. Driving reliability improvements and leading a team at Lloyds Banking Group.
DevOps Engineer responsible for managing Microsoft Intune operations at Bundesdruckerei GmbH. Focused on ensuring secure digital solutions for identity and data protection in Berlin.
Senior Site Reliability Engineer driving observability and reliability for business - critical systems at Incedo. Collaborating with engineering teams to enhance system resilience and performance.
DevSecOps Specialist securing the software development lifecycle at Vanguard. Collaborating with teams to improve application security tooling and processes, and provide development guidance.
Site Reliability Engineer automating infrastructure deployment for Scaleway's sovereign cloud products. Collaborating with product teams to enhance observability and reliability of the platform.
Reliability Engineer responsible for equipment reliability and safety using data - driven analysis for Wood in Aberdeen. Focus on proactive maintenance and operational efficiency.
Principal Safety and Reliability Engineer developing and supporting safety design for mission - critical aerospace systems. Engaging in design reviews and ensuring compliance with requirements.
Cloud DevOps Engineer playing a pivotal role in developing migration plans for Coast Guard Cloud Architecture. Collaborating with teams to ensure effectiveness and best practices in cloud implementation.