Support critical AI and DevOps platforms at Citi, contributing to global finance solutions. Collaborate with engineering teams and enhance platform stability and support processes.
Responsibilities
Contribute to the stability, reliability, and performance of AI and DevOps platforms.
Assist with vendor relationship management, including coordination with offshore managed services.
Support efforts to improve service levels for end users by enhancing operational efficiencies.
Partner with development teams to guide improvements in application stability and supportability.
Contribute to frameworks for managing capacity, throughput, and latency.
Assist in defining and implementing application onboarding guidelines and standards.
Support team members by fostering a collaborative environment and encouraging skill development.
Participate in cost-reduction efforts through Root Cause Analysis reviews, knowledge management, performance tuning, and user training.
Participate in business review meetings to help align technology tools and strategies with business requirements.
Ensure adherence to support processes and tool standards, and assist in enhancing processes to promote consistency and quality across the support program.
Perform other duties and functions as assigned.
Support platform leadership in defining the platform roadmap and partnering with engineering teams.
Assist in executing resilience activities such as wargaming scenarios, chaos engineering tests, and disaster recovery drills.
Contribute to automation initiatives aimed at reducing manual toil and improving platform efficiency.
Support the enterprise-wide observability strategy, including monitoring, logging, tracing, and alerting.
Maintain hands-on familiarity with platform architecture and services as needed for operational support.
Assist in overseeing the operational health of production platforms (including OpenShift, ECS, CI/CD), ensuring SLAs are supported and incident processes are followed.
Help implement and operate effective monitoring and observability strategies to support proactive issue detection and system health assessments.
Requirements
5–7 years of relevant experience in a hands-on technical or support leadership role
Experience contributing to architecture discussions and ensuring solutions align with enterprise standards and long-term maintainability.
Experience working with senior stakeholders or technology partners.
Demonstrated experience supporting IT service improvements or platform stability initiatives.
Strong communication and presentation skills, with the ability to convey technical concepts clearly.
Experience supporting or contributing to technical roadmaps or operational workstreams.
Experience participating in resilience-related activities such as incident simulations, disaster recovery exercises, or stability testing.
Ability to collaborate with cross-functional support teams and technology groups.
Strong organizational and workload-planning skills.
Consistently demonstrates clear and concise written and verbal communication skills.
Ability to communicate appropriately with relevant stakeholders.
Working knowledge of Generative AI concepts preferred.
Experience with CI/CD and configuration management tools preferred.
Experience with Red Hat OpenShift or similar Kubernetes technologies preferred.
Experience working with databases such as Postgres, Oracle, MongoDB, or Redis preferred.
Experience writing or maintaining code in Java, Python, Go, or similar languages preferred.
Hands-on experience with modern observability and monitoring tools (e.g., Prometheus, Grafana, Splunk, ELK) preferred.
Benefits
mentorship
continuous learning
flexibility with potential hybrid work opportunities
Job title
DevOps Application Support Manager – Vice President
DevOps Engineer managing AWS infrastructure and deployment pipelines at Harver, an industry - leading hiring solution provider. Collaborating on automation and security across cloud services and application performance.
DevOps Engineer managing cloud infrastructure and CI/CD pipelines at Clear Labs. Modernizing CI/CD processes and ensuring compliance for a hybrid edge - to - cloud stack.
DevOps Engineer supporting clients in modernizing IT infrastructures at Booz Allen Hamilton. Collaborating with cloud teams on developing and managing cloud solutions with innovative tech.
Staff Site Reliability Engineer leading reliability and infrastructure strategy at Flowcode. Collaborating with teams to ensure scalable systems for continued growth in a hybrid work environment.
Non Production Management Technical Lead position in North America DevOps for GCG applications. Requires strong DevOps, production management, and leadership skills with a focus on consumer applications.
AI Reliability Engineer ensuring high quality of AI agent platforms for hospitality industry. Involve in observability, cloud infrastructure management, and CI/CD processes.
DevOps Engineer within a technology - focused team improving and maintaining AWS cloud - based solutions. Collaborating in the evolution and performance optimization of Flexion's platform.
Senior DevOps Engineer ensuring reliable and automated GCP - native infrastructure at Search Atlas. Collaborating across teams to enhance observability and streamline deployment processes.
DevOps Engineer securing and industrializing cloud environments for SaaS and cybersecurity challenges at YONI. Join a dynamic team to optimize infrastructure and processes.
DevSecOps Specialist with Azure focus at Iver, enhancing IT solutions for Nordic customers. Join a team driving innovation and security in cloud services.