Site Reliability Engineer responsible for leading technology teams at SS&C. Delivering scalable and resilient infrastructure platforms in the financial services and healthcare technology sector.
Responsibilities
Collaborate with Technology Infrastructure teams to build and operate reusable, cloud-native platforms
Work with business units and technical teams to improve application availability, observability, and reliability
Enhance platform reliability through automatic problem detection and self-healing systems
Use SLOs, SLIs, and KPIs to guide prioritization and measure impact
Eliminate toil using intelligent automation and agentic workflows
Conduct blameless retrospectives and share learnings across the organization
Foster a culture of ownership and continuous learning
Integrate DevSecOps, zero-trust principles, and policy-as-code into every pipeline
Produce and promote Architecture Decision Records (ADRs) and Cloud Well-Architected Frameworks
Requirements
5+ years of professional experience in a SRE role
3+ years in financial services or other regulated industries preferred
Minimum Bachelor’s degree in Computer Science, Engineering, or a related field
Proven expertise in architecting, designing and operating private cloud environments (e.g., VMware, OpenStack, OpenShift Virtualization) and Kubernetes clusters
Hands-on experience with building, deploying and operating infrastructure as code platforms
Experience with CI/CD pipelines and observability platforms (e.g., Prometheus, Splunk)
Strong understanding of modern systems reliability standards and practices
Familiarity with financial services regulatory frameworks and their impact on infrastructure design and operations
Familiarity with structured naming conventions and asset management for global infrastructure
Experience with financial-grade network segmentation, micro-segmentation, and zero-trust architecture
Certifications such as TOGAF, AWS Certified Solutions Architect, VMware VCP, or Red Hat Certified Architect are a plus
Familiarity with ISO 27001, NIST 800-53, and other security frameworks is a plus.
DevSecOps Engineer responsible for enhancing Thales' secure hosting platforms in public and private clouds. Collaborating with teams to apply modern practices and build resilient infrastructures.
Develops high - automation services in Golang or Java within AWS, Kubernetes, and Azure. Supports teams in building secure applications while working in a hybrid environment.
DevOps Engineer specializing in AWS Cloud Infrastructure in a hybrid position. Collaborating within a supportive team to build modern infrastructure for VM - based applications.
Leading DevOps platform strategy for KIPMI Software's next - generation digital trust products. Collaborating with teams to implement scalable infrastructure and DevSecOps practices.
Join our DevOps team to build and manage GitHub pipelines and cloud - native Azure solutions. Collaborate with teams to drive DevOps best practices and optimize deployments.
Site Reliability Engineer enhancing system reliability and deployment practices at OpenLoop. Collaborating with cross - functional teams for incident management and performance tuning.
Senior DevOps Engineer enhancing Azure application reliability for a healthcare fintech platform. Collaborating closely with engineering teams to ensure deploy safety and observability.
DevOps Engineer contributing to tooling changes and leading a community of practice at Totara. Focused on collaboration, development, and support for internal teams.
Site Reliability Engineer responsible for infrastructure supporting AI platform. Safeguarding US customer data and ensuring compliance in the Aerospace and Defense sector.