ML/AI Ops Engineer responsible for operationalizing, deploying, and sustaining machine learning solutions. Collaborating across data science, software engineering, and cloud infrastructure at Xcel Energy.
Responsibilities
Lead and support solution lifecycle technical activities
Ensure solutions are designed for great user experience and operational performance
Lead design, ensuring Enterprise Architecture, Security, Operations and Compliance aspects are continuously integrated into solutions
Provide input to cost and schedule estimation
Responsible for overall integrity of system design and operation
Oversee vendor activities
Conduct peer reviews and approve system changes and technical solution design
Coach and mentor less experienced team members
Partner cross-organizationally to drive minimal costs on optimal solutions
Provide in-depth technical information to stakeholders as needed
Innovate through usage of industry emerging capabilities and evolving customer needs
Provide input to strategic roadmap and technical dependencies
Continuously stay current on, and apply, technical industry knowledge pertaining to the respective domain
Review solution performance and continually assess health of systems
Track and drive awareness to operational and technical debt risks
Provide escalated support to incident and problem management
Utilize analytics to improve availability, reliability, efficiency and capacity
Productionize machine learning and AI models, including classical ML and GenAI, using standardized MLOps pipelines
Manage end-to-end model lifecycle activities: versioning, promotion, rollback, retraining, and retirement
Implement CI/CD practices for models, features, and inference services
Design, build, and maintain reusable MLOps pipelines for training, validation, deployment, and monitoring
Develop common components (feature pipelines, quality checks, evaluation harnesses) to reduce friction across AI projects
Implement monitoring for model performance, data drift, bias, and system health
Own AI/ML operational SLAs, SLOs, and incident response, including root-cause analysis and post-mortems
Ensure high availability, resilience, and recoverability of AI services
Support regulated or high-risk AI use cases by embedding governance, validation, and documentation into MLOps workflows
Produce and maintain required artifacts such as model cards, system cards, validation evidence, and audit support materials
Partner closely with AI Governance and Risk teams to ensure alignment with enterprise standards
Requirements
Ten years of related functional experience
Bachelor's degree in Technology, Science, Business or related field, or 4 years of experience equivalent to the position
Excellent communication skills
Excellent Relationship Management and collaboration skills
Expertise managing the lifecycle of technical solutions
Deep Subject Matter Expertise within the respective system domain products, platforms, processes and architecture
Broad and deep knowledge of technology architecture, infrastructure, network, security and software principles and models
Experience working in partnership with internal and external vendors
Excellent analytical, problem-solving and troubleshooting skills
Extensive knowledge of future technology trends within area of expertise
Demonstrated leadership on technical aspects of large-scale projects
Experience coaching other developers in system deployment or operational troubleshooting
Experience with delivery methodologies (Waterfall, Agile, Scrum) and operational models (ITIL)
Experience and understanding of core IT Service Management functions, such as Change Management and Incident Management
Benefits
Annual Incentive Program
Medical/Pharmacy Plan
Dental
Vision
Life Insurance
Dependent Care Reimbursement Account
Health Care Reimbursement Account
Health Savings Account (HSA) (if enrolled in eligible health plan)
Limited-Purpose FSA (if enrolled in eligible health plan and HSA)
Transportation Reimbursement Account
Short-term disability (STD)
Long-term disability (LTD)
Employee Assistance Program (EAP)
Fitness Center Reimbursement (if enrolled in eligible health plan)
AI SDLC Engineer integrating modern AI tooling into the full Software Development Life Cycle at Quento Technologies S.A. Collaborating with teams to enhance product development and delivery.
Cyber Manager at Prosus focusing on security risks in AI systems and technology audits. Collaborating across teams and traveling to engage with global stakeholders.
AI Consultant implementing Agentic AI solutions within healthcare at Cognizant. Focused on modernizing healthcare operations with AI - driven automation and intelligent workflows.
AI & Sales Operations Engineer building innovative AI tools for business impact at papernest. Collaborating with international teams to optimize operations and drive efficiency from Barcelona.
AI Portfolio Manager overseeing AI solutions to enhance healthcare services. Collaborates across departments to implement impactful AI initiatives in a hybrid work environment.
Director leading Data and AI consulting engagements for RSM's Integrated Cloud Consulting practice. Advising C - suite on data modernization and AI strategy with a focus on enterprise architecture.
Manager leading client engagements in AI and Advanced Analytics for business value creation. Design and deliver scalable solutions, mentor teams, and collaborate nationally.
Lead the day - to - day execution of AI adoption and enablement programs for Great American Insurance Group. Collaborate closely with Enterprise Organizational Change Management to ensure successful implementation.
Enterprise AI Portfolio Manager responsible for creating and maintaining enterprise AI priorities and insights. Leading portfolio management efforts to support informed decision - making at Great American Insurance Group.
AI/ML Deployment Lead managing end - to - end AI lifecycle at U.S. Bank. Ensuring compliance and operational excellence with cross - functional collaboration.