AI Inference Engineer developing AI model optimizations for Quadric's GPNPU platforms. Porting and benchmarking AI models to enhance performance in edge devices.
Responsibilities
Quantize, prune and convert models for deployment
Port models to Quadric platform using Quadric toolchain
Optimize inference deployment for latency, speed
Benchmark and profile model performance and accuracy
Develop tools to scale and speed up the deployment
Make Improvement to SDK and runtime
Provide technical support and documents to customers and developer community
Requirements
Bachelor’s or Master’s in Computer Science and/or Electric Engineering.
5+ years of experience in AI/LLM model inference and deployment frameworks/tools
experience with model quantization (PTQ, QAT) and tools
experience with model accuracy measures
experience with model inference performance profiling
experience with at least one of the following frameworks: onnxruntime, Pytorch, vLLM, huggingface-transformer, neural-compressor, llamacpp
Proficiency in C/C++ and Python
Demonstrate good capability in problem solving, debug and communication
AI Trainer responsible for delivering training on AI and Cloud technologies with flexible scheduling. Engaging participants through webinars and workshops for an international media organization.
Principal Director leading applied AI initiatives at Aerospace Corporation focused on national security and advanced infrastructure. Driving innovation through AI and cloud integration for complex space systems.
Busperson role at ai Pazzi restaurant in Las Vegas, maintaining dining room standards and assisting servers. Providing excellent guest service and adhering to department policies.
Specialist in Gen AI Development at Sun Life working on innovative technologies and solutions. Collaborating with teams to implement GenAI technologies and improve existing processes.
Supports planning and organization of research projects in Human AI - Interaction at Fraunhofer Institute. Involves interdisciplinary teamwork and various research methodologies.
Werkstudent in Condition Monitoring mit Embedded AI at Fraunhofer in Nürnberg. Developing solutions for embedded AI systems in a flexible part - time role.
Specialist in Gen AI Development at Sun Life, a leading financial services company. Focusing on app development and design with cloud technologies and Gen AI applications.
AI Project Delivery Manager leading the delivery of meaningful AI, Analytics, and Reporting projects at Orica. Transforming business needs into real solutions and guiding data delivery teams.
Senior AI VFX Engineer driving executive engagement with media leaders at Adobe. Overseeing VFX workflow innovations and establishing best practices in media and entertainment sectors.