Site Reliability Engineer responsible for system reliability and performance at a leading financial services technology company. Collaborating with infrastructure, engineering, and security teams to build robust systems.
Responsibilities
Maintain and improve the uptime, performance, and availability of production systems.
Define and track SLIs , SLOs , and SLAs to ensure service reliability and user satisfaction.
Implement and manage monitoring, alerting, and observability tools (e.g., Prometheus, Grafana, Datadog, ELK).
Participate in on-call rotations and respond to incidents, performing root cause analysis and postmortems.
Automate repetitive tasks and processes using scripts, configuration management, and Infrastructure as Code (IaaC).
Develop CI/CD pipelines to streamline deployment and operational processes.
Analyze system performance and capacity trends to plan for future growth.
Collaborate with engineering teams to design systems that scale reliably.
Support cloud and/or hybrid infrastructure (AWS, Azure, GCP, VMware, etc.).
Manage system provisioning, configuration, and patching via tools such as Ansible, Terraform, or Puppet.
Act as a bridge between development and operations teams, championing DevOps and SRE principles.
Contribute to a culture of continuous improvement, reliability, and accountability.
Requirements
Bachelor’s degree in Computer Science, Engineering, or related field (or equivalent experience).
3+ years of experience in a Site Reliability, DevOps, or Systems Engineering role.
Experience with Linux/Unix systems , Windows , shell scripting, and administration.
Proficiency in at least one programming/scripting language (Python, Go, Bash, etc.).
Hands-on experience with cloud platforms ( AWS , Azure , or GCP ).
Strong knowledge of networking, security, load balancing, and DNS.
Experience with monitoring/logging tools (e.g., Prometheus, Grafana, ELK, Splunk, Datadog).
Benefits
Flexibility : Hybrid Work Model & a Business Casual Dress Code, including jeans
Your Future: 401k Matching Program, Professional Development Reimbursement
Work/Life Balance: Flexible Personal/Vacation Time Off, Sick Leave, Paid Holidays
Your Wellbeing: Medical, Dental, Vision, Employee Assistance Program, Parental Leave
Diversity & Inclusion: Committed to Welcoming, Celebrating and Thriving on Diversity
Training: Hands-On, Team-Customized, including SS&C University
Extra Perks: Discounts on fitness clubs, travel and more!
Principal Release Engineer leading and orchestrating end - to - end release management at F5. Driving cross - platform coordination and ensuring seamless releases across enterprise transformation programs.
Site Reliability Engineer focused on developing and improving Kubernetes configurations for F5's infrastructure. Collaborating with product teams and ensuring operational delivery processes are efficient and reliable.
Sr DevOps Manager leading the way in Cloud infrastructure, DevOps, and SRE practices at F5. Empowering engineers and fostering a culture of collaboration and improvement.
Senior Site Reliability Engineer developing IT infrastructure and automation solutions for Coinbase. Collaborating with Infrastructure, security, and compliance teams to enhance operational efficiency.
DevOps Engineer joining AI and Innovation team to ensure scalable, secure, and resilient systems at global media agency. Collaborating with UX and AI engineers for next - generation media experiences.
Site Reliability Engineer at HPE ensuring high availability and performance of cloud infrastructure across AWS and GCP environments. Managing incidents, monitoring systems, and supporting multi - cloud production.
Senior SRE/DevOps managing cloud architecture, driving automation, and ensuring operational reliability at Extensiv. Collaborating with teams to design scalable systems on AWS.
Site Reliability Engineer supporting Vista Global’s production environments and cloud infrastructure. Delivering solutions using AWS, Terraform, Ansible, Docker, and Kubernetes in a hybrid model.
Site Reliability Engineer responsible for architecting cloud infrastructure and containerized platforms at Vista Global. Implementing CI/CD pipelines and mentoring teams on best practices for production environments.
Senior DevOps Engineer focused on network automation and cloud infrastructure at Tiger Analytics. Building scalable solutions for multiple Fortune 500 companies and ensuring high availability and performance.