DevOps Engineer designing and maintaining HPC and cloud infrastructure at NVIDIA. Focused on automation and collaboration with R&D engineering teams for optimal performance.
Responsibilities
Design, build, and maintain robust HPC and cloud infrastructure utilizing state-of-the-art technologies and cutting-edge hardware.
Automate the deployment, configuration, and management of network and compute resources, including bare metal servers, network switches, and VMs.
Develop and streamline CI/CD pipelines to enable continuous integration and delivery for our R&D engineering teams.
Optimize infrastructure for availability, performance, observability, security, scalability, and cost-efficiency.
Collaborate closely with software development teams to integrate automated testing and quality assurance into deployment workflows.
Troubleshoot and resolve issues spanning infrastructure, networking, hardware, and software environments.
Stay current with industry trends and incorporate best practices in DevOps and infrastructure management.
Requirements
Bachelor’s or Master’s degree in Computer Science, Data Science, Engineering, or a related field.
2+ years of experience building and maintaining DevOps processes, with a strong focus on automation and reliability.
Solid experience with major Linux distributions (Ubuntu, RedHat).
Proficient scripting skills in Bash, Python, and Groovy.
Understanding of Linux virtualization technologies (QEMU/KVM, Docker) and orchestrators like Kubernetes.
Hands-on experience with configuration management tools (e.g., Ansible).
Knowledge in on-premises networking and compute infrastructure.
Familiarity with CI/CD tools such as Jenkins, GitHub/GitLab, Kubernetes, and GitHub Actions.
Excellent written and verbal communication skills in English.
Site Reliability Engineer at Meniga, enhancing cloud infrastructure for digital banking solutions. Focus on reliability, scalability, and collaborative teamwork in a hybrid role.
Senior SRE Engineer enhancing software development for Woven by Toyota with observability and cloud engineering. Working with product teams to ensure reliability and observability.
DevOps Engineer covering the full spectrum of CI/CD, hosting, and security in an agile team. Contributing to a digitalization platform that integrates B2B data.
Manager at Deloitte focused on developing diverse, high - performing teams in Information Technology. Collaborating effectively to achieve key objectives and delivering high - quality results.
Senior DevOps Engineer building and maintaining cloud infrastructure for Aevi's payment platform. Focusing on automation and collaboration with developers to ensure reliable systems.
SRE Software Engineer II managing TomTom’s API systems and enhancing customer experience through system engineering. Ensuring high availability and reliability of mission - critical systems.
Senior DevOps Engineer at Worth AI focusing on systems improvement and operational excellence. Leading infrastructure design while collaborating effectively across engineering teams.
Intermediate DevOps Engineer at WorkNomads designing and operating AWS cloud environments. Automating deployments, managing infrastructure, and collaborating with teams for optimal performance.