Senior Performance and Development Engineer at NVIDIA focusing on optimizing AI workloads and developing scalable AI infrastructure tools. Collaborating with a diverse team to enhance Deep Learning applications.
Responsibilities
Build AI models, tools and frameworks that provide real time application performance metrics that can be correlated with system metrics.
Develop automation frameworks that empower applications to thoughtfully predict and overcome system/infrastructure failures, ensuring fault tolerance.
Collaborate with software teams to pinpoint performance bottlenecks.
Design, prototype, and integrate solutions that deliver demonstrable performance gains in production environments.
Adapt and enhance communication libraries to seamlessly support innovative network topologies and system architectures.
Design or adapt optimized storage solutions to boost Deep Learning efficiency, resilience, and developer productivity.
Requirements
BS/MS/PhD (or equivalent experience) in Computer Science, Electrical Engineering or a related field.
12+ years of proven experience in analyzing and improving performance of training applications using PyTorch or similar framework.
Building distributed software applications using collective communication libraries such as MPI or NCCL or UCC.
Construct storage solutions for Deep Learning applications.
Building automated fault tolerant distributed applications.
Building tools for bottleneck analysis and automation of fault tolerance in distributed environments.
Strong background in parallel programming and distributed systems.
Experience analyzing and optimizing large scale distributed applications.
Excellent verbal and written communication skills.
Senior Metering Project Engineer overseeing metering systems for renewable design and construction. Working with teams on project requirements from concept through construction and commissioning.
Lighting Design Engineer responsible for designing controls systems tailored to project needs at Acuity Inc. Collaborating with sales agents and using visual software for deliverables.
Laboratory Calibration Engineer responsible for calibration requirements and maintaining reference standards at Teradyne. Ensuring accuracy and operational readiness in measurement systems across the organization.
Intern position for digital functional verification engineer at Teradyne in Costa Rica. Engage in FPGA designs and system verification methodologies under technical supervision.
Engineer Software - Embedded at Northrop Grumman designing and developing software for end - user customers. Collaborating with multi - disciplinary teams in an agile environment to enhance embedded systems.
Process Engineer at CMC monitoring shop performance and recommending improvements. Expertise in raw materials and EAF processes required in our steel manufacturing facility.
Project Engineer at Leonardo DRS leading innovative technical projects for naval power systems. Overseeing system design, component development, and multi - disciplinary engineering teams.
SharePoint Engineer providing third level technical support on Microsoft platforms for Sentinel Technologies. Responsible for system administration and project implementation with customer service focus.
ILS Engineer responsible for the design and development of the Support Solution for Dreadnought Crew Training programme. Requires expertise in Integrated Logistic Support disciplines and stakeholder management.
Early - Career Water Resources Engineer solving water quality challenges at Geosyntec, an innovative consulting firm for environmental and civil infrastructure issues.