Site Reliability Engineer Lead Analyst at Citi overseeing application systems analysis and reliability. Leading monitoring, automation, and collaborative initiatives to enhance system performance.
Responsibilities
Monitor, Measure and analyze the system's performance and availability
Resolve variety of high impact problems/projects through in-depth evaluation of complex business processes, system processes, and industry standards
Provide expertise in area and advanced knowledge of applications programming and ensure application design adheres to the overall architecture blueprint
Utilize advanced knowledge of system flow and develop standards for coding, testing, debugging, and implementation
Develop comprehensive knowledge of how areas of business, such as architecture and infrastructure, integrate to accomplish business goals
Provide in-depth analysis with interpretive thinking to define issues and develop innovative solutions
Serve as advisor or coach to junior SRE engineers, allocating work as necessary
Develop and maintain automated tools and systems to manage and monitor the infrastructure
Reduce manual intervention, human errors and the time it takes to perform routine tasks
Periodically assess the capacity of needs of services and work on scaling them to handle the increased usage
Plan for resource allocation, manage load balancing and ensure the system can handle demand fluctuations
Work to detect, diagnose and resolve issues quickly to minimize the impact on users and business
Conduct post-incident reviews to learn and improve system's reliability
Work with different development teams, product owners and other stakeholders to ensure seamless deliveries and aligning to a common goal
Requirements
6+ years of relevant experience in Apps Development or systems analysis role
Extensive experience system analysis and in programming of software applications
Extensive experience in automated pipelines, automated testing and automated security controls
Extensive experience in the use of logging tools/systems (splunk, appDynamics, etc...)
Experience in managing and implementing successful projects
Subject Matter Expert (SME) in at least one area of Applications Development
Ability to adjust priorities quickly as circumstances dictate
Demonstrated leadership and project management skills
Consistently demonstrates clear and concise written and verbal communication
Benefits
medical, dental & vision coverage
401(k)
life, accident, and disability insurance
wellness programs
paid time off packages, including planned time off (vacation), unplanned time off (sick leave), and paid holidays
Job title
Site Reliability Engineer, Lead Analyst, Vice President
Senior Site Reliability Engineer focusing on reliability and operational excellence of workflow orchestration platforms like Apache Airflow. Engaging in operations and engineering projects in a hybrid setup.
Senior Site Reliability Engineer for observability platforms at Dimensional, ensuring reliability and scaling the infrastructure. Collaborating with teams on operations and engineering projects.
Senior Staff Reliability Engineer for the humanoid robotics team ensuring performance and safety standards. Leading reliability engineering initiatives and mentoring within the engineering team.
Reliability Engineer at Air Liquide optimizing maintenance strategies, ensuring equipment uptime across multiple sites in the United States. Collaborating with teams for continuous improvement and operational excellence.
Senior Azure Engineer at Capgemini responsible for building, operating, and optimizing cloud - native platforms. Collaborating with teams to ensure reliability, performance, and security for critical workloads.
DevOps Engineer specialized in Cloud environments at Avanquest, planning and migrating services to the Cloud and implementing microservice architectures.
Lead DevOps Engineer designing cloud infrastructure for ML/AI solutions in medical imaging. Collaborating across teams for scalable, secure platforms that optimize data operations.
DevOps/SRE Engineer for cloud environments developing ERP software at Scopevisio. Focus on AWS, infrastructure scaling, and modern technologies in a collaborative team.
Senior Coordinator for Infrastructure and DevOps leading technological infrastructure strategy and team development at RD Saúde. Ensuring stability, security, and cost efficiency in cloud operations.
Azure DevOps IT Engineer at iKnowHealth managing cloud and hybrid solutions with Microsoft Azure. Responsible for optimizing infrastructure and ensuring system performance in healthcare software.