Data Center Network Deployment Engineer for NVIDIA's HPC/AI Infrastructure team. Deploying and managing large scale AI Data Centers with a focus on networking and automation.
Responsibilities
Deploy, manage and maintain large scale AI Data Centers - control, network and storage stack
Work with multiple software and hardware teams to optimize the clusters networking health and performance
Develop and implement automation scripts for network, compute and storage operations and deployments
Supporting Research & Development activities and engaging in POCs/POVs for future improvements
Requirements
B.Sc. in Engineering or CCNP certificate
3+ years of proficiency in networking fundamentals, configuring ethernet switches, understanding the TCP/IP stack, and data center architecture.
Excellent knowledge of Windows and Linux (Redhat/CentOS and Ubuntu) networking (sockets, firewalls, iptables, wireshark, etc.) and internals, ACLs and OS level security protection and common protocols e.g. TCP, DHCP, DNS, etc.
Proactive individual with the ability to work independently, prioritizing tasks to optimize technology and enhance customer experience.
Provides ad-hoc knowledge transfers, develops handover materials, and offers deployment support for engagements.
Benefits
NVIDIA is widely considered to be one of the technology world’s most desirable employers! • Health insurance • 401(k) matching • Paid time off • Flexible work arrangements • Professional development opportunities
Full - Stack Engineer enhancing engineering productivity at Fidelity. Building internal tools for SRE teams to improve operational efficiency and reliability.
DevOps Engineer at Cloudogu working with development and operations for reliable software delivery. Focusing on CI/CD, infrastructure automation, and platform services in an agile environment.
Jr. DevOps Engineer supporting and improving CI/CD pipelines and Linux systems at Swift. Collaborating with senior engineers in a hands - on learning environment.
Senior DevOps Engineer I managing automation tooling and multi - cloud infrastructure at Spring Health. Collaborating with AI and Infrastructure teams in a hybrid Seattle office.
Site Reliability Engineer for cloudified backup platform using Commvault technology at Expleo. Joining a dynamic team to ensure backup infrastructure scalability and reliability.
Site Reliability Engineer responsible for designing and maintaining scalable services with high availability. Collaborating with development teams to enhance reliability and operational excellence.
Technical Staff leading the architecture, reliability, and modernization of enterprise ALM and DevOps tools. Driving strategy and influencing product development in collaboration with various teams.
Site Reliability Engineer responsible for reliability and availability, collaborating with development teams on scalable systems. Applying software engineering practices to improve production operations.
DevOps Engineer in the Security Data and AI Lab at Lloyds Banking Group driving data and cloud infrastructure's influence on product operations and customer service improvements.