Site Reliability Engineer at FNBO’s Technical Operations team managing ITSM practices and collaborating with developers. Leading incident response, monitoring, automation, and performance optimization efforts.
Responsibilities
Lead ITSM practices including change, incident, problem and knowledge management
Lead major incident response calls and help resolve issues.
Lead the Change Management process ensuring change preparedness
Lead the Problem management process and perform problem review analysis
Assist in monitoring FNBO systems/applications and follow best practices for proactive monitoring and resolution.
Participate in the on-call rotation to respond to incidents in a timely manner.
Support performance optimization efforts to ensure system peak performance and customer satisfaction.
Assist in conducting Post-Mortems after major incidents to help identify root causes and prevent future issues.
Collaborate with Application Development teams on deployment pipeline maintenance.
Contribute to the growth of the Site Reliability Engineering practice.
Perform Operational Readiness review and validate ready for implementation
Provide Knowledge Management leadership
Create executive reports on identified team KPI’s monthly
Requirements
Bachelor’s degree in a related field (or 3-5 years of related experience).
Knowledge of ITIL practices and relevant ITIL certifications (e.g., ITIL Foundation) preferred.
Familiarity with service ticket tool, like ServiceNow.
Familiarity with monitoring tools such as Dynatrace (preferred) or other similar platforms.
Understanding of basic development practices, scripting, automation, and monitoring.
Ability to automate tasks using scripting tools.
Knowledge of agile practices is a plus.
Ability to work effectively in a team environment and engage in team activities.
Candidates must possess unrestricted work authorization and not require future sponsorship.
Join a Data Engineering Team as a Senior DevOps to support multiple Data & AI initiatives. Utilize cloud technologies and enhance data pipelines in a collaborative environment.
Principal Site Reliability Engineer at Early Warning designing performance and resiliency patterns for applications and infrastructure. Collaborating with development teams to improve systems and data integrity.
DevOps Engineer contributing to CI/CD setup and Azure services management. Collaborates with teams to ensure efficient project delivery in a hybrid environment.
IT DevOps Specialist at BMW responsible for analyzing requirements and implementing software solutions in AWS cloud environments. Collaborating internationally within agile teams for digital transformation projects.
DevOps Engineer at Vistra designing, implementing, and maintaining robust CI/CD pipelines and cloud infrastructure. Enabling software delivery across multiple technology stacks with a focus on AWS.
Manage complex customer rollouts and initial system deployments at Talex.ai. Bridging technical development with real - world application in robotics and AI systems.
Cloud Operations Engineer designing and implementing highly reliable cloud solutions. Leading cloud infrastructure initiatives for production operations and customer success in a growing team.
Quality Engineer supporting new product launches and reliability testing for SSD at Micron in Malaysia. Responsible for coordinating test activities and conducting failure analysis.
Reliability Engineer ensuring operational readiness of data centers at Rowan Digital Infrastructure. Overseeing commissioning, operational standards, and transitioning facilities into live operations.
Manager of Mechanical Engineering ensuring high - availability mechanical systems in data centers. Collaborating on lifecycle management and performance evaluation across missions - critical facilities in a hybrid role.