Site Reliability Engineer at Fidelity responsible for the reliability and observability strategy. Ensuring system availability through technical standardization and process refinement.
Responsibilities
Help define and execute a comprehensive reliability and observability strategy, ensuring that Fidelity’s systems are always available when our customers need them.
Bring together technical, procedural, and financial data to reduce toil and increase efficiency.
You will execute plans for technical standardization and process refinement within the engineering organization, especially for Site Reliability Engineers.
Coach peer SREs and development teams on how to build highly available systems.
Requirements
Bachelor’s degree or higher in a technology related field (e.g. Engineering, Computer Science, etc.) required, master’s degree a plus.
5+ years of hands-on experience deploying and/or supporting highly distributed multi-tiered systems at scale.
Strong experience in Cloud development (preferably AWS) and migration skills;
2-4 years of experience in software development with Python, NodeJS, or Java with a focus on SDLC and automation
Hands-on experience with container orchestration, preferably with Kubernetes
Solid understanding of Cloud Computing and DevOps concepts including CI/CD pipelines
Experienced in Instrumentation with systems skills on building and operating, monitoring, logging, alerting services of distributed systems at scale.
Proven experience in maintaining scalability and resiliency of complex environment.
Ability to triage, execute root cause analysis, and be decisive under pressure.
Benefits
Most roles at Fidelity are Hybrid, requiring associates to work onsite every other week (all business days, M-F) in a Fidelity office.
DevOps intern contributing to SSO logs integration for ELK stack at Atos. Enhancing authentication observability and supporting log collection and visualization at a leading digital transformation company.
DevOps Engineer creating a new cloud - native SSO solution based on NGINX and Kubernetes at Atos. Involves collaborating on the transition from Apache and VM to a modern infrastructure.
DevOps Engineer managing infrastructure and CI/CD at Boost - IT. Optimizing Kubernetes, GitLab CI/CD, and security practices in a hybrid remote work setting.
Lead Software Engineer at Honeywell Aerospace Technologies ensuring reliability, availability, and performance of systems by collaborating with development and operations teams.
Site Reliability Engineer leading reliability engineering efforts at Honeywell Aerospace in Krakow, Poland. Driving improvements, collaborating with teams to enhance system reliability and performance.
Team Lead overseeing Infrastructure Administration and DevOps at MoMo Payment Service Bank, ensuring high availability and compliance across cloud and on - premises environments.
Senior Cloud DevOps Engineer ensuring scalability and reliability of AI pipelines. Designing AWS environments and contributing to the DevOps culture in a collaborative team.
Site Reliability Engineer focusing on system reliability and automation for high - performance production systems in Warsaw. Collaborating with engineering teams for effective deployment and operational efficiency.
Lead DevOps Engineer modernizing infrastructure and automation for Wells Fargo’s Consumer Technology platforms. Collaborating across teams to build scalable solutions and elevate engineering excellence.