AI Inference Engineer developing AI model optimizations for Quadric's GPNPU platforms. Porting and benchmarking AI models to enhance performance in edge devices.
Responsibilities
Quantize, prune and convert models for deployment
Port models to Quadric platform using Quadric toolchain
Optimize inference deployment for latency, speed
Benchmark and profile model performance and accuracy
Develop tools to scale and speed up the deployment
Make Improvement to SDK and runtime
Provide technical support and documents to customers and developer community
Requirements
Bachelor’s or Master’s in Computer Science and/or Electric Engineering.
5+ years of experience in AI/LLM model inference and deployment frameworks/tools
experience with model quantization (PTQ, QAT) and tools
experience with model accuracy measures
experience with model inference performance profiling
experience with at least one of the following frameworks: onnxruntime, Pytorch, vLLM, huggingface-transformer, neural-compressor, llamacpp
Proficiency in C/C++ and Python
Demonstrate good capability in problem solving, debug and communication
Working student assisting Aurixus GmbH in process automation and integration of AI tools. Focused on building real automation solutions rather than discussing concepts.
AI Data Operations Manager overseeing end - to - end operations for geospatial AI data capture and labeling. Leading collaboration for high - quality datasets in a hybrid work environment.
AI Deployment Manager leading teams to design and implement enterprise AI solutions. Collaborating with clients to drive material ROI through AI transformation programs.
AI Intern supporting Acosta Group Canada's AI strategy development. Collaborating with stakeholders to prototype AI solutions and drive innovation in the organization.
Investor Relations Specialist at Sidetrade responsible for managing financial communications and fostering investor engagement. Position involves collaboration with C - suite and analyzing market trends.
AI - First Technical Associate at Progrits supporting the CTO and building tech solutions. Engaging in technical problem - solving and collaborating in a dynamic environment.
Model Optimization & Deployment Engineer optimizing large - scale ML models for Zoox's autonomous vehicle technology. Focused on deployment for efficient real - time execution in vehicles.
Applied AI Engineer at Upvest driving AI implementation to enhance business efficiency. Collaborating with teams to integrate AI solutions and optimize workflows across company operations.
AI Analyst contributing to cross - functional product teams at Personio, leveraging data insights and analytics for decision - making. Responsible for event tracking, impact measurement, and user behavior analysis.