DevOps Engineer architecting and refining automated systems for NVIDIA's e-commerce platform. Designing infrastructure, optimizing AWS environments, and managing deployment processes for resilience and efficiency.
Responsibilities
Architect and refine automated deployment Jenkins pipelines to ensure seamless, zero-downtime releases
Design, build, and maintain enterprise-scale infrastructure using Terraform
Establish modular, reusable patterns for AWS resources
Optimize and manage sophisticated AWS environments with a focus on cost-efficiency and security
Transition our monitoring from reactive to proactive using AI-powered observability tools (e.g., Datadog Watchdog) for automated root cause analysis (RCA) and anomaly detection
Define and monitor SLOs and SLAs
Lead incident response and conduct thorough post-mortems to improve system resilience
Requirements
8+ years or equivalent industry experience
Bachelor's/Master's Degree in Computer Science, Software Engineering, or equivalent experience
Exceptionally strong background in developing CI/CD processes and deployment pipelines using Jenkins
Extensive experience architecting on AWS Cloud and running services such as API Gateway, Lambda, EKS/ECS, RDS, S3, and SQS
Expert-level knowledge of Terraform (including state management, workspaces, and complex module development)
Advanced experience with Kubernetes (EKS) and Docker, including orchestration, service meshes, and Helm
Strong proficiency in a scripting language, such as Python, for automation and custom tooling
Reliability Engineer focused on the dependability and mission success of complex space systems. Involvement includes analyses, collaboration, and adherence to aerospace reliability standards.
DevOps Engineer automating IT processes at Maurer Electronics GmbH in Hannover. Engaging in continuous integration and development with team collaboration and innovative solutions.
DevOps Engineer working with IT Security Team in Berlin, developing and supporting complex IT Security Services. Collaborating on automated IT - Security - Services with cutting - edge technologies and methodologies.
DevOps Engineer focusing on deploying high - security on - prem infrastructure and MLOps platforms for mission - critical systems. Collaborating on Kubernetes - based orchestration and machine learning workloads.
Cloud Site Reliability Engineer managing Solace Cloud services across leading cloud providers. Ensuring reliability, handling incidents, and collaborating with customers for operational excellence.
Senior Cloud Site Reliability Engineer ensuring reliability and health of Solace Cloud Services with hands - on cloud operations expertise. Lead incident management and customer support for high - impact environments.
DevOps Engineer designing and operating AWS infrastructure within industrial IoT environments. Working on systems that ensure security, resilience, and end - to - end observability.
Sr. Site Reliability Engineer (SRE) III providing technical solutions for the federal government. Collaborating in a high - performing team focused on reliability and application scalability.