AI Platform & Systems Engineer providing operational support for GPU-based compute infrastructure at BNY. Deploying and troubleshooting containerized AI workloads and automating processes with modern tooling.
Responsibilities
Provide hands-on operational support and incident management for GPU-based compute infrastructure across hybrid and on-prem environments.
Deploy, monitor, and troubleshoot containerized AI workloads using Kubernetes, Docker, and GPU orchestration tools such as Run:AI, Volcano, or Kubeflow.
Automate infrastructure processes and workload provisioning using Python, Bash, and configuration management tools.
Maintain and scale training/inference workloads using GitOps tools like Helm, ArgoCD, and integrate with CI/CD pipelines (GitLab, Jenkins).
Requirements
Bachelor's degree in computer science or a related discipline, or equivalent work experience required; advanced degree preferred
8-10 years of related experience required; experience in the securities or financial services industry is a plus.
Experience with Linux administration (RHEL/Ubuntu), shell scripting, and system-level debugging.
Proven experience running distributed systems in Kubernetes and containerized environments using Docker.
Familiarity with GPU resource management, including NVIDIA GPU Operator and device plugin lifecycle.
Experience with CI/CD workflows and infrastructure automation tools such as GitLab CI, Jenkins, Terraform, Helm, or Ansible.
Knowledge of networking fundamentals and persistent storage systems.
Exposure to cloud platforms (AWS, GCP, Azure) and hybrid GPU environments.
Ability to read and support Python code focused on ML/AI pipeline integration.
Strong analytical and troubleshooting skills with a collaborative mindset.
Effective communication skills and proactive ownership of platform reliability and performance.
Benefits
BNY offers highly competitive compensation, benefits, and wellbeing programs rooted in a strong culture of excellence and our pay-for-performance philosophy.
We provide access to flexible global resources and tools for your life’s journey.
Focus on your health, foster your personal resilience, and reach your financial goals as a valued member of our team, along with generous paid leaves, including paid volunteer time, that can support you and your family through moments that matter.
Systems Engineer I designing, implementing, and managing complex systems at Honeywell. Collaborating with cross - functional teams to enhance operational efficiency and maintain quality standards.
ADAS Systems Engineer at Ford leading development of advanced driver assistance systems. Supporting design and optimization of automotive technology solutions for cutting - edge applications.
Vehicle Cyber Security Systems Engineer at Ford Motor Company influencing product security. Collaborate across teams to safeguard automotive technology and compliance.
Senior Network Engineer at Cybertrol responsible for industrial network design and deployment. Collaborating with cross - functional teams to deliver OT solutions in a hybrid work environment.
System Engineer leading embedded software development projects at embeX, working collaboratively to create customized solutions for demanding systems. Engaging with clients to gather requirements and guide software implementation.
Start your career as a Fachinformatiker specializing in Systemintegration. Work on complex IT systems and communication solutions in a supportive training environment.
Auszubildender zum Fachinformatiker Systemintegration bei Th. Geyer GmbH in Renningen. Planung, Konfiguration, und Administration von IT - Systemen mit einem engagierten Team.
Ausbildung zum Fachinformatiker für Systemintegration. Verantwortung für reibungslosen IT - Betrieb, Unterstützung bei digitalen Krankenhausabläufen und Schulung von Anwendern.
System Engineer managing Windows server and client environments, focusing on security and infrastructure. Collaborating with internal partners and external service providers for optimal IT setup.
Senior IT - System Engineer at DATAGROUP responsible for independent analysis and resolution of application errors on Windows and Linux servers. Collaborating with departments and partners in complex issue analysis.