Site Reliability Engineer responsible for architecting cloud infrastructure and containerized platforms at Vista Global. Implementing CI/CD pipelines and mentoring teams on best practices for production environments.
Responsibilities
Architect, build, and maintain scalable, secure cloud infrastructure in AWS using Terraform and Ansible.
Design and manage containerized platforms using Docker, Kubernetes (EKS), ECS/Fargate, and Helm.
Implement and optimize CI/CD pipelines in GitLab, enabling reliable and automated deployments.
Apply Best practices to ensure high availability, performance, monitoring, and incident response in 24/7 production environments.
Support and troubleshoot production systems, focusing on scalability, resilience, and technical debt reduction.
Mentor engineers and guide teams on cloud-native architecture and DevOps best practices.
Maintain documentation (Confluence/Jira) and communicate effectively with technical and non-technical stakeholders.
Requirements
Extensive in-depth experience with cloud-based provisioning, monitoring, troubleshooting, and related SRE and DevOps technologies, in addition to networking knowledge
3+ years in a technical role, ability to teach and influence engineers.
Strong experience architecting cloud infrastructure with AWS.
Strong experience with Linux and infrastructure as code IAC.
Strong experience with containerization/orchestration technologies.
Strong understanding of multiple source control systems such as GitLab or GitHub.
Strong Experience with CI/CD automation and configuration management.
Experience working in a 24/7 on-call, highly transactional or streaming production environment.
Must be able to ensure Agile/Scrum concepts and principles are adhered to, must be able to be a voice of reason.
Understanding of the SDLC.
Scripting and good foundational understating of programming with Python, and Bash (can include Typescript).
Demonstrates extensive knowledge of the principles, concepts, and theories in own discipline, and broad knowledge of principles and concepts of other functions.
Has developed extensive business knowledge and keeps current on industry trends.
Having a customer focus, drive for results, and strong ethics & values.
Must be a team player but also be able to work independently with minimal direction and supervision.
Possess a good understanding of multiple business applications, as well as experience in minicomputer or client/server environments including, but not limited to, the implementation and support of resource planning, sales automation, marketing, finance, and distribution systems.
Site Reliability Engineer responsible for system reliability and performance at a leading financial services technology company. Collaborating with infrastructure, engineering, and security teams to build robust systems.
Principal Release Engineer leading and orchestrating end - to - end release management at F5. Driving cross - platform coordination and ensuring seamless releases across enterprise transformation programs.
Site Reliability Engineer focused on developing and improving Kubernetes configurations for F5's infrastructure. Collaborating with product teams and ensuring operational delivery processes are efficient and reliable.
Sr DevOps Manager leading the way in Cloud infrastructure, DevOps, and SRE practices at F5. Empowering engineers and fostering a culture of collaboration and improvement.
Senior Site Reliability Engineer developing IT infrastructure and automation solutions for Coinbase. Collaborating with Infrastructure, security, and compliance teams to enhance operational efficiency.
DevOps Engineer joining AI and Innovation team to ensure scalable, secure, and resilient systems at global media agency. Collaborating with UX and AI engineers for next - generation media experiences.
Site Reliability Engineer at HPE ensuring high availability and performance of cloud infrastructure across AWS and GCP environments. Managing incidents, monitoring systems, and supporting multi - cloud production.
Senior SRE/DevOps managing cloud architecture, driving automation, and ensuring operational reliability at Extensiv. Collaborating with teams to design scalable systems on AWS.
Site Reliability Engineer supporting Vista Global’s production environments and cloud infrastructure. Delivering solutions using AWS, Terraform, Ansible, Docker, and Kubernetes in a hybrid model.
Senior DevOps Engineer focused on network automation and cloud infrastructure at Tiger Analytics. Building scalable solutions for multiple Fortune 500 companies and ensuring high availability and performance.