Lead Site Reliability Engineer managing critical IT systems for S&P Dow Jones Indices. Focused on service availability, incident management, and developer collaboration to enhance operational reliability.
Responsibilities
Support and maintain highly available, scalable IT systems and infrastructure hosting S&P DJI’s critical index platforms and applications.
Act as a working lead, providing technical leadership while remaining hands-on and contributing as an individual contributor based on operational demands, project requirements, and incident response needs.
Lead incident response efforts, conducting root cause analysis and implementing preventive measures to minimize system downtime and improve reliability.
Develop and maintain automation frameworks for deployment, monitoring, and infrastructure management to reduce manual intervention and increase operational efficiency.
Collaborate with development teams to implement SRE best practices, including service level objectives (SLOs), error budgets, and reliability engineering principles.
Monitor system performance, capacity planning, and resource optimization to ensure optimal performance of production environments.
Drive continuous improvement initiatives by analysing system metrics, identifying bottlenecks, and implementing solutions that enhance overall system reliability.
Requirements
Bachelor's degree in Computer Science, Information Systems or Engineering is required, or in lieu, a demonstrated equivalence in work experience.
8-10 years of experience in Technical operations or Application/Data support roles with focus on high‑availability systems.
Experience with cloud platforms such as AWS (including ECS, EKS, S3, CloudFront) or equivalent cloud technologies.
Experience with monitoring and observability platforms such as Datadog and its key modules (APM, DBM, logging, and Infrastructure monitoring), or similar tools like Dynatrace, Prometheus, or Grafana.
Proficiency in database technologies including PostgreSQL/Oracle PL/SQL, stored procedures, and NoSQL databases.
Advanced PostgreSQL experience including performance tuning and optimization.
Strong programming skills for automation using scripting languages such as Shell, Python, or similar.
Experience with DevOps practices and CI/CD pipeline management using tools like Jenkins, GitLab CI, or Azure DevOps.
Knowledge of networking protocols including TCP/IP, Unicast, Multicast, Sockets and IP addressing.
Experience working with large datasets in Equity, Commodities, Forex, Futures and Options asset classes.
Familiarity with ITSM processes & tools such as ServiceNow, PagerDuty, or similar incident management platforms.
Excellent communication skills with strong verbal and writing proficiencies.
Benefits
Health & Wellness: Health care coverage designed for the mind and body.
Flexible Downtime: Generous time off helps keep you energized for your time on.
Continuous Learning: Access a wealth of resources to grow your career and learn valuable new skills.
Invest in Your Future: Secure your financial future through competitive pay, retirement planning, a continuing education program with a company-matched student loan contribution, and financial wellness programs.
Family Friendly Perks: It’s not just about you. S&P Global has perks for your partners and little ones, too, with some best-in class benefits for families.
Beyond the Basics: From retail discounts to referral incentive awards—small perks can make a big difference.
Senior DevOps Engineer responsible for cloud ecosystem architecture at health - tech startup. Building HIPAA/GDPR - compliant foundations and mentoring developers.
Senior Backend Engineer building product features and maintaining infrastructure for insurance platform. Employing tools like Terraform, Kafka, Datadog and Qovery with a strong DevOps focus.
DevOps Systems Engineer supporting customer operations in Annapolis Junction, MD. Responsible for creating, sustaining, and troubleshooting complex operational data flows.
OpenShift Fresher assisting Cloud team in managing containerized applications using Red Hat OpenShift. Supporting CI/CD, deployment automation, and cloud - native application environments.
Site Reliability Engineer for Leidos ensuring reliability, performance, and scalability of complex distributed systems for the Navy - Marine Corps Intranet. Collaborating with teams to maintain and optimize network operations and services.
DevOps Engineer evolving banking infrastructure for a fintech company. Focusing on observability, incident response, and platform automation in a hybrid work setup.
Lead DevOps Engineer developing AI - powered supply chain intelligence solutions at S&P Global Mobility. Collaborate with data scientists and engineers to optimize operational infrastructure and continuous delivery processes.
Senior DevOps Engineer managing development and deployment pipelines for AI products at Plaud. Optimize infrastructure, enhance productivity, and collaborate with cross - functional teams.
Senior SRE Engineer ensuring reliability and performance of AI products at Plaud. Designing scalable systems and leading incident response to improve operational maturity.
DevOps Engineer supporting big data solutions and AWS infrastructure deployment at Enlighten. Collaborating with teams to ensure reliability, scalability, and performance of cloud services.