Senior HPC and AI Cluster Administrator at NVIDIA specializing in high-performance computing infrastructure. Responsible for deploying and managing AI clusters while supporting R&D initiatives.
Responsibilities
Deploy, manage and maintain large scale HPC/AI clusters
Managing Linux job/workload schedules and orchestration tools
Support and maintain continuous integration and delivery pipelines
Troubleshooting and fixing, bottom up from bare metal, operating system, software stack and application level
Supporting Research & Development activities and engaging in POCs/POVs for future improvements
Requirements
Bachelor's Degree in Computer Science, Engineering, or a related field; or equivalent experience
5+ years of experience
Knowledge of HPC and AI solution technologies from CPU’s and GPU’s to high speed interconnects and supporting software
Experience with job scheduling workloads and orchestration tools such as Slurm, K8s
Excellent knowledge of Windows and Linux (Redhat/CentOS and Ubuntu) networking (sockets, firewalls, iptables, wireshark, etc.) and internals, ACLs and OS level security protection and common protocols e.g. TCP, DHCP, DNS, etc.
Experience with multiple storage solutions such as Lustre, GPFS, zfs and xfs.
Familiarity with newer and emerging storage technologies.
Python programming and bash scripting experience, automation and configuration management tools such as Jenkins, Ansible, Gitops
Knowledge of Networking Protocols like InfiniBand, Ethernet
Experience with virtual systems (for example VMware, Hyper-V, KVM)
Familiarity with cloud computing platforms (e.g. AWS, Azure, Google Cloud)
Benefits
highly competitive salaries
an extensive benefits package
work environment that promotes diversity, inclusion, and flexibility
AI & Automation student position at buah focusing on automating processes and integrating AI solutions in workflows. Engage in hands - on project implementations and documentation of solutions within the team.
Freelance Growth & Outbound Lead at AI SaaS startup managing outbound funnel and campaigns. Collaborate with founder on Go - to - Market strategy and scale lead generation efforts.
Area Manager leading AI and automation projects for Corporate Financing across the Baltics. Driving innovation, efficiency, and adoption of AI solutions within Swedbank.
Consultant focusing on AI - related projects at a Brussels consulting firm. Working with banking and financial services clients on impactful digital solutions.
Applied AI Engineer at Upvest leveraging AI to enhance business operations in fintech. Building integrations and consulting across teams for optimization of AI tools and workflows.
AI Productivity Engineer at Aircall responsible for building AI - powered developer tools improving engineering productivity. Join a diverse and ambitious team redefining customer communications.
Distinguished Technologist leading AI strategy and architecture for HPE's Private Cloud AI offerings. Engaging in technical design workshops and mentoring engineers.
Lead AI Enablement at Smartly, driving AI strategy and improving business operations with automation. Collaborate across teams to identify opportunities and deliver impactful results.
Senior AI Ecosystem Tech Strategist leading technical strategy for joint offerings with AI ISVs. Driving alignment with Red Hat's AI portfolio to key strategic ISV partners.