Lead Systems Engineer managing AI platform operations at emerging AI infrastructure start-up. Oversee vendor collaboration, technical troubleshooting, and customer engagement for optimal service delivery.
Responsibilities
Coordinate resolution of complex issues (L3) to (vendor) product/engineering teams and manage vendor responses
Monitor system health, alerts, and customer usage patterns
Document solutions/workarounds, create and maintain knowledge, document support procedures
Automate common tasks and fixes
Configure and integrate tooling to support optimal operation of the platform, and support tool selection
Assist customers with platform configuration, onboarding, and usage best practices
Collaborate with platform and infrastructure support/engineering teams to resolve platform integration issues
Ensure SLAs and customer satisfaction targets are met
L1 support for customer-reported issues and requests
L2 support by diagnosing, replicating, and troubleshooting issues across platform and infrastructure
Work with customers and multiple stakeholders to understand requirements and challenges, provide reporting on usage, workflow and billing
Requirements
Extensive experience in technical support, system engineering, or platform operations
Solid understanding of L1 and L2 support processes (ticketing, escalation, troubleshooting)
Familiarity with cloud-based platforms, APIs, and distributed systems
Understanding of AI/ML concepts and tooling (model training, inference, data pipelines basics)
Experience with monitoring/logging tools (e.g., Grafana, Kibana, Splunk)
Excellent communication skills to interface with both customers and internal / vendor teams
Good understanding of tools requirements for ML engineers and data scientists, and how to optimize the experience
System administration experience with OS's like RHEL/CentOS, Ubuntu, tuning Linux kernel
Proficiency with Ansible, Nvidia and CUDA toolkits, Kubernetes and container orchestration
Understanding of automation, monitoring and security with GPU as a service.
Fachinformatiker Ausbildung in Systemintegration bei Bird & Bird in Düsseldorf. Erlernen relevanter IT - Themen für eine praxisorientierte Ausbildung und hervorragende Übernahmechancen.
Trainee in IT system integration at MANN & SCHRÖDER COSMETICS GROUP. Engaging in various IT tasks and projects while providing support and gaining insights across departments.
Senior IT - System Engineer at DATAGROUP analyzing and resolving IT infrastructure issues in Windows and Linux systems. Collaborating with departments and partners for effective troubleshooting.
IT specialist for system integration managing IT systems implementation and maintenance. Collaborating with a team on complex projects and ensuring security compliance.
System Engineer creating software packages using various technologies for DATAGROUP, focusing on quality tests and documentation for software distribution and support.
System Engineer packaging software for DATAGROUP in Leipzig, focusing on software distributions and quality testing. Engaging in comprehensive support and documentation processes for IT Services.
System Engineer working on software packaging and deployment technologies at DATAGROUP. Involves quality testing, documentation, and problem analysis in Berlin.
Senior IT - System Engineer at DATAGROUP handling system errors on Windows and Linux servers. Collaborating closely with different teams for troubleshooting and maintaining system environments.
Senior IT - System Engineer analyzing and resolving issues in applications on Windows and Linux servers at DATAGROUP. Collaborating with teams for system improvements and support on migrations.
Business Systems Analyst/Billing Engineer at Revolgy focusing on AI - native orchestration and forensic investigation in a complex tech environment. Collaborating with finance teams to streamline processes and automate solutions.