Site Reliability Engineer at FIS supporting innovation in financial services and technology. Driving automation, application reliability, and customer-centric solutions across multiple teams.
Responsibilities
Design and maintain monitoring solutions for infrastructure, application performance, and user experience
Implement automation tools to streamline tasks, scale infrastructure, and ensure seamless deployments
Ensure application reliability, availability, and performance, minimizing downtime and optimizing response times
Lead incident response, including identification, triage, resolution, and post-incident analysis
Conduct capacity planning, performance tuning, and resource optimization
Collaborate with security teams to implement best practices and ensure compliance
Manage deployment pipelines and configuration management for consistent and reliable app deployments
Develop and test disaster recovery plans and backup strategies
Collaborate with development, QA, DevOps, and product teams to align on reliability goals and incident response processes
Participate in on-call rotations and provide 24/7 support for critical incidents
Requirements
Proficiency in development technologies, architectures, and platforms (web, API)
Experience with cloud platforms (AWS, Azure, Google Cloud) and IaC tools (Terraform)
Knowledge of monitoring tools (Prometheus, Grafana, DataDog) and logging frameworks (Splunk, ELK Stack)
Experience in incident management and post-mortem reviews
Strong troubleshooting skills for complex technical issues
Proficiency in scripting languages (Python, Bash) and automation tools (Terraform, Ansible)
Experience with CI/CD pipelines (Harness, Jenkins, GitLab CI/CD, Azure DevOps)
Ownership approach to engineering and product outcomes
Excellent interpersonal communication, negotiation, and influencing skills
DevOps Engineer responsible for designing and supporting CI/CD pipelines for Xumo. Collaborating with teams to enhance cloud infrastructure for video streaming services.
Software Developer responsible for developing and optimizing functionalities for a PHP/Symfony platform. Collaborating on projects in a data - driven environment focused on product data solutions.
DevOps Engineer at Perelyn supporting cloud infrastructures and providing technical consulting to clients. Engaging in various DevOps projects within a dynamic remote work environment in Germany.
Site Reliability Engineer ensuring the reliability and performance of cloud - native infrastructure at Sanlam Fintech. Collaborating with teams to deliver innovative solutions across the African continent.
DevOps Engineer building and owning a scalable event streaming platform for data analytics. Working at Statista, a leading business data platform, with hybrid and international team environments.
DevOps Engineer creating a new cloud - native SSO solution based on NGINX and Kubernetes at Atos. Involves collaborating on the transition from Apache and VM to a modern infrastructure.
DevOps intern contributing to SSO logs integration for ELK stack at Atos. Enhancing authentication observability and supporting log collection and visualization at a leading digital transformation company.
DevOps Engineer managing infrastructure and CI/CD at Boost - IT. Optimizing Kubernetes, GitLab CI/CD, and security practices in a hybrid remote work setting.
Site Reliability Engineer leading reliability engineering efforts at Honeywell Aerospace in Krakow, Poland. Driving improvements, collaborating with teams to enhance system reliability and performance.
Lead Software Engineer at Honeywell Aerospace Technologies ensuring reliability, availability, and performance of systems by collaborating with development and operations teams.