AI Platform & Systems Engineer providing operational support for GPU-based compute infrastructure at BNY. Deploying and troubleshooting containerized AI workloads and automating processes with modern tooling.
Responsibilities
Provide hands-on operational support and incident management for GPU-based compute infrastructure across hybrid and on-prem environments.
Deploy, monitor, and troubleshoot containerized AI workloads using Kubernetes, Docker, and GPU orchestration tools such as Run:AI, Volcano, or Kubeflow.
Automate infrastructure processes and workload provisioning using Python, Bash, and configuration management tools.
Maintain and scale training/inference workloads using GitOps tools like Helm, ArgoCD, and integrate with CI/CD pipelines (GitLab, Jenkins).
Requirements
Bachelor's degree in computer science or a related discipline, or equivalent work experience required; advanced degree preferred
8-10 years of related experience required; experience in the securities or financial services industry is a plus.
Experience with Linux administration (RHEL/Ubuntu), shell scripting, and system-level debugging.
Proven experience running distributed systems in Kubernetes and containerized environments using Docker.
Familiarity with GPU resource management, including NVIDIA GPU Operator and device plugin lifecycle.
Experience with CI/CD workflows and infrastructure automation tools such as GitLab CI, Jenkins, Terraform, Helm, or Ansible.
Knowledge of networking fundamentals and persistent storage systems.
Exposure to cloud platforms (AWS, GCP, Azure) and hybrid GPU environments.
Ability to read and support Python code focused on ML/AI pipeline integration.
Strong analytical and troubleshooting skills with a collaborative mindset.
Effective communication skills and proactive ownership of platform reliability and performance.
Benefits
BNY offers highly competitive compensation, benefits, and wellbeing programs rooted in a strong culture of excellence and our pay-for-performance philosophy.
We provide access to flexible global resources and tools for your life’s journey.
Focus on your health, foster your personal resilience, and reach your financial goals as a valued member of our team, along with generous paid leaves, including paid volunteer time, that can support you and your family through moments that matter.
Principal Operations Support Systems Engineer providing mission - specific systems engineering and integration support to an operational government customer within the Intelligence Community.
Senior Systems Engineer developing and supporting defense systems' full lifecycle at Raytheon. Collaborating across disciplines to ensure system functionality and mission success.
Senior Principal Systems Engineer at Raytheon develops systems for defense projects. Involved in the entire lifecycle from concept to deployment with a focus on ensuring mission success.
Principal Systems Engineer responsible for technical leadership in wireless communications for DoW and Federal clients. Joining Parallel Wireless to innovate in energy - efficient Open RAN solutions.
Responsible for analyzing and developing system requirements at Unimed Fortaleza, focusing on user collaboration and high - quality deliverables. Involves programming, testing, and system deployment.
Lead Technical Systems Engineer overseeing the planning and implementation of mission - critical systems for Captivation Software. Collaborating with teams to deliver high - quality technical solutions.
Systems Analyst II at Cleveland Clinic supporting technology initiatives and improving healthcare delivery. Collaborating on various IT projects and developing system requirements for patient - centered services.
HRIS Systems Administrator supporting day - to - day SAP SuccessFactors HR operations. Ensure data integrity and maintain operational support across the employee lifecycle in a collaborative environment.
Join our team for a dual study program in Intelligent Systems Engineering. Collaborate in developing electronic systems and work in an international environment.
Senior Distributed Storage System Engineer responsible for designing and developing complex software solutions for distributed storage systems at HPE. Leading project teams and driving innovation in technology integration.