Site Reliability Engineer responsible for enhancing cloud infrastructure and deployment systems. Key role in scalability and operational efficiency at Hewlett Packard Enterprise.
Responsibilities
Enhance Infrastructure as Code (IAC) and enforce best practices.
Optimize cloud infrastructure for scalability, security, and cost-effectiveness.
Develop internal tools to support and streamline cloud platform operations.
Improve CI/CD pipelines and deployment workflows using FluxCD and Jenkins.
Address container image vulnerabilities and standardize remediation processes.
Build Amazon Machine Images (AMIs) aligned with CIS and STIG benchmarks.
Strengthen monitoring, alerting, and observability using Prometheus, Grafana, and logging tools.
Troubleshoot complex production issues to ensure system reliability and customer satisfaction.
Fine-tune distributed systems such as Apache Kafka and Cassandra.
Collaborate with development, security, and operations teams to align infrastructure with application needs.
Requirements
Minimum of 10 years of hands-on experience in Infra Ops, Dev Ops, or Site Reliability Engineering (SRE)
Proficiency with Linux systems, especially Debian-based distributions
Strong experience with cloud platforms such as AWS and GCP
Expertise in Infrastructure as Code tools like Terraform, Packer, and Ansible
Solid programming skills in Python and/or Golang
Deep understanding of containerization (Docker, Container) and orchestration tools (AWS EKS, GCP GKE)
Experience with GitOps workflows
Proven track record in implementing and maintaining CI/CD pipelines
Strong background in security and familiarity with security programs
Experience with monitoring and logging tools (Prometheus, Grafana, ELK)
Knowledge of both relational (SQL) and non-relational databases
Excellent problem-solving and debugging skills with a strong sense of ownership
Experience managing distributed systems like Apache Kafka and Cassandra
Effective communicator and collaborative team player
Site Reliability Engineer improving reliability of cloud infrastructure for an AI - specialized company. Taking ownership of monitoring and incident response processes in hybrid - working style.
DevOps Engineer leading automation for sophisticated release/deployment pipelines at Securonix. Focused on Python, Ansible, and cloud services to enhance security operations.
Senior Analyst on Data Platform DevOps at AIMCo, responsible for building data operations and collaborating with teams on innovative solutions. Focused on ensuring data quality and integrity across technologies.
Principal Engineer driving systemic reliability improvements for Xero's software products. Leading technical initiatives and mentoring teams in engineering excellence.
DevOps Engineer at Constantinople enhancing release processes for the AI - native banking platform. Collaborate across teams ensuring CI/CD pipeline reliability and operational efficiency in the APAC timezone.
DevOps Engineer in the US helping with digital transformation projects for international clients. Utilizing AWS, Terraform, and CI/CD tools in a global operations team.
DevOps Master/Specialist working on banking solutions, automating CI/CD pipelines and managing cloud infrastructure. Requires experience in DevOps and low - code technologies.
Junior MLOps Engineer helping to design and maintain AI/ML systems at Bupa. Collaborating with teams to operationalize machine learning models and automate workflows.
DevOps Engineer responsible for building and maintaining scalable AI systems on Azure cloud. Collaborating with teams to ensure operational excellence for enterprise - grade AI solutions.