Onsite Lead, Site Reliability Engineer

Posted 14 hours ago

Apply now

About the role

  • Lead Site Reliability Engineer managing critical IT systems for S&P Dow Jones Indices. Focused on service availability, incident management, and developer collaboration to enhance operational reliability.

Responsibilities

  • Support and maintain highly available, scalable IT systems and infrastructure hosting S&P DJI’s critical index platforms and applications.
  • Act as a working lead, providing technical leadership while remaining hands-on and contributing as an individual contributor based on operational demands, project requirements, and incident response needs.
  • Lead incident response efforts, conducting root cause analysis and implementing preventive measures to minimize system downtime and improve reliability.
  • Develop and maintain automation frameworks for deployment, monitoring, and infrastructure management to reduce manual intervention and increase operational efficiency.
  • Collaborate with development teams to implement SRE best practices, including service level objectives (SLOs), error budgets, and reliability engineering principles.
  • Monitor system performance, capacity planning, and resource optimization to ensure optimal performance of production environments.
  • Drive continuous improvement initiatives by analysing system metrics, identifying bottlenecks, and implementing solutions that enhance overall system reliability.

Requirements

  • Bachelor's degree in Computer Science, Information Systems or Engineering is required, or in lieu, a demonstrated equivalence in work experience.
  • 8-10 years of experience in Technical operations or Application/Data support roles with focus on high‑availability systems.
  • Experience with cloud platforms such as AWS (including ECS, EKS, S3, CloudFront) or equivalent cloud technologies.
  • Experience with monitoring and observability platforms such as Datadog and its key modules (APM, DBM, logging, and Infrastructure monitoring), or similar tools like Dynatrace, Prometheus, or Grafana.
  • Proficiency in database technologies including PostgreSQL/Oracle PL/SQL, stored procedures, and NoSQL databases.
  • Advanced PostgreSQL experience including performance tuning and optimization.
  • Strong programming skills for automation using scripting languages such as Shell, Python, or similar.
  • Experience with DevOps practices and CI/CD pipeline management using tools like Jenkins, GitLab CI, or Azure DevOps.
  • Knowledge of networking protocols including TCP/IP, Unicast, Multicast, Sockets and IP addressing.
  • Experience working with large datasets in Equity, Commodities, Forex, Futures and Options asset classes.
  • Familiarity with ITSM processes & tools such as ServiceNow, PagerDuty, or similar incident management platforms.
  • Excellent communication skills with strong verbal and writing proficiencies.

Benefits

  • Health & Wellness: Health care coverage designed for the mind and body.
  • Flexible Downtime: Generous time off helps keep you energized for your time on.
  • Continuous Learning: Access a wealth of resources to grow your career and learn valuable new skills.
  • Invest in Your Future: Secure your financial future through competitive pay, retirement planning, a continuing education program with a company-matched student loan contribution, and financial wellness programs.
  • Family Friendly Perks: It’s not just about you. S&P Global has perks for your partners and little ones, too, with some best-in class benefits for families.
  • Beyond the Basics: From retail discounts to referral incentive awards—small perks can make a big difference.

Job title

Lead, Site Reliability Engineer

Job type

Experience level

Senior

Salary

Not specified

Degree requirement

Bachelor's Degree

Location requirements

Report this job

See something inaccurate? Let us know and we'll update the listing.

Report job