Lead Systems Engineer managing AI platform operations at emerging AI infrastructure start-up. Oversee vendor collaboration, technical troubleshooting, and customer engagement for optimal service delivery.
Responsibilities
Coordinate resolution of complex issues (L3) to (vendor) product/engineering teams and manage vendor responses
Monitor system health, alerts, and customer usage patterns
Document solutions/workarounds, create and maintain knowledge, document support procedures
Automate common tasks and fixes
Configure and integrate tooling to support optimal operation of the platform, and support tool selection
Assist customers with platform configuration, onboarding, and usage best practices
Collaborate with platform and infrastructure support/engineering teams to resolve platform integration issues
Ensure SLAs and customer satisfaction targets are met
L1 support for customer-reported issues and requests
L2 support by diagnosing, replicating, and troubleshooting issues across platform and infrastructure
Work with customers and multiple stakeholders to understand requirements and challenges, provide reporting on usage, workflow and billing
Requirements
Extensive experience in technical support, system engineering, or platform operations
Solid understanding of L1 and L2 support processes (ticketing, escalation, troubleshooting)
Familiarity with cloud-based platforms, APIs, and distributed systems
Understanding of AI/ML concepts and tooling (model training, inference, data pipelines basics)
Experience with monitoring/logging tools (e.g., Grafana, Kibana, Splunk)
Excellent communication skills to interface with both customers and internal / vendor teams
Good understanding of tools requirements for ML engineers and data scientists, and how to optimize the experience
System administration experience with OS's like RHEL/CentOS, Ubuntu, tuning Linux kernel
Proficiency with Ansible, Nvidia and CUDA toolkits, Kubernetes and container orchestration
Understanding of automation, monitoring and security with GPU as a service.
Structural Systems Engineer specializing in structural analysis of aerospace vehicle pressurized systems. Involving design, development, and execution of test programs for launch and space structures.
Systems Engineer at Quevera collaborating with experts to deliver innovative solutions. Join our dynamic team recognized as a top employer in the Baltimore/DC area.
Staff Systems Engineer working on delivering complex software applications into operations with a talented team at CACI. Supporting development and verification of mission capabilities while ensuring operational efficiency.
Senior Systems Engineer supporting mission - critical software and AI/ML product development. Collaborating within an Agile team to transition complex systems to operational use.
IT Support Specialist ensuring installation, support, and maintenance of IT systems in healthcare settings. Focusing on efficiency, stability, and customer service with a team - oriented approach.
RF Systems Engineer III developing spacecraft communication systems for civil, commercial, and National Security Space programs. Collaborating with cross - functional teams to enhance RF communications technology.
Systems Engineer supporting deployment and operational reliability in cloud - based healthcare platform. Collaborate with engineering and QA teams to manage cloud environments and troubleshoot issues.
Business Systems Analyst participating in daily support and enhancement of systems for health care. Involved in development and configuration to support Cambia's mission in health care.
Systems Analyst for Connecticut Children’s health improving computer systems and supporting colleagues. Utilizing data gathering techniques for effective solutions in a healthcare environment.
Epic Systems Analyst supporting pharmacy IT systems for Connecticut Children’s. Utilizing expertise in complex application and systems enhancements or replacements.