Staff Site Reliability Engineer designing and implementing scalable infrastructure solutions while optimizing system performance. Collaborating across teams for Walmart International to ensure reliability and security of critical systems.
Responsibilities
Design and develop scalable, modular infrastructure solutions
Implement automation scripts to enhance system operability
Optimize system performance through tuning and reliability testing
Conduct root cause analysis for performance and availability issues
Collaborate on disaster recovery planning and ensure compliance with security standards
Monitor system health using key performance indicators
Requirements
Bachelor's degree in computer science or related field with 4 years of experience in site reliability engineering or 6 years of related experience
Strong knowledge of infrastructure automation, coding standards, and scripting for CI/CD pipelines
Hands-on experience with AI/ML technologies
Skilled in Kubernetes and open-source chaos engineering tools
Experience with performance tuning and optimization on Unix/Linux platforms and JavaScript/Node.js environments
Senior Staff Reliability Engineer for the humanoid robotics team ensuring performance and safety standards. Leading reliability engineering initiatives and mentoring within the engineering team.
Reliability Engineer at Air Liquide optimizing maintenance strategies, ensuring equipment uptime across multiple sites in the United States. Collaborating with teams for continuous improvement and operational excellence.
Senior Azure Engineer at Capgemini responsible for building, operating, and optimizing cloud - native platforms. Collaborating with teams to ensure reliability, performance, and security for critical workloads.
DevOps Engineer specialized in Cloud environments at Avanquest, planning and migrating services to the Cloud and implementing microservice architectures.
Lead DevOps Engineer designing cloud infrastructure for ML/AI solutions in medical imaging. Collaborating across teams for scalable, secure platforms that optimize data operations.
DevOps/SRE Engineer for cloud environments developing ERP software at Scopevisio. Focus on AWS, infrastructure scaling, and modern technologies in a collaborative team.
Senior Coordinator for Infrastructure and DevOps leading technological infrastructure strategy and team development at RD Saúde. Ensuring stability, security, and cost efficiency in cloud operations.
Azure DevOps IT Engineer at iKnowHealth managing cloud and hybrid solutions with Microsoft Azure. Responsible for optimizing infrastructure and ensuring system performance in healthcare software.
SRE Manager leading a team in reliability engineering at WEX. Overseeing system stability and balancing feature delivery within Microsoft Azure ecosystem.
Lead DevOps Architect guiding AWS and LaunchDarkly solutions. Overseeing enterprise - grade feature management and technical leadership with hands - on implementation.