Solution Architect developing comprehensive AI infrastructure solutions for deployment at d-Matrix. Collaborating with clients to enable successful integration of d-Matrix based solutions.
Responsibilities
Develop end-to-end AI infrastructure reference solutions optimized for d-Matrix servers including compute, networking, storage, and orchestration layers, in collaboration with various internal teams.
Create reference blueprints that integrate smoothly into cloud-native and on-prem environments.
Develop infrastructure-as-code templates and examples using Ansible, Terraform, and Helm for provisioning d-Matrix-based nodes and clusters.
Integrate with Kubernetes-based systems to enable model deployment, auto-scaling, and fault-tolerant execution.
Design and deploy telemetry and monitoring frameworks to support real-time visibility into d-Matrix cluster health, job status, and system performance.
Integrate with industry-standard observability stacks (e.g., Prometheus, Grafana, OpenTelemetry) for data collection, visualization, and alerting.
Develop dashboards, health check systems, and metric pipelines that track performance, availability, and operational KPIs
Collaborate with performance and software teams to validate infrastructure using real-world workloads and benchmarks.
Incorporate telemetry hooks for benchmark reporting and feedback-driven tuning.
Create and publish detailed infrastructure deployment guides, monitoring configuration templates, and operational best practices.
Collaborate with customers and OEM/ISV ecosystem, enable them to adopt and customize reference solutions to their specific datacenter environments and/or software stacks.
Requirements
Bachelor's or Master’s degree in Computer Science, or related technical field
10+ years of experience in infrastructure solution architecture, systems management, DevOps, or platform engineering roles.
Experience working with GPUs, custom AI accelerators or heterogeneous compute environments.
Proven expertise in building, managing, and monitoring full-stack AI infrastructure at scale.
Enterprise Solution Architect leading WMS implementation projects at Blue Yonder. Driving value through defining business processes and providing technical architectural solutions.
Senior Solutions Developer designing, developing, and supporting hybrid cloud solutions at HPE. Leading technical strategies and collaborating on significant project work.
Solutions Architect leading Google Cloud pre - sales engagements at Qodea. Collaborating with commercial teams and technical clients to create innovative cloud solutions.
Senior Software Solutions Architect leading technology solution design within Christian Care Ministry’s domain pillars. Working with stakeholders and software delivery for quality feature development.
Senior Marine Electrical Integration Engineer leading integration of marine power solutions at Caterpillar. Overseeing engineering projects and ensuring compliance with maritime standards and regulations.
Solutions Consultant driving training and education strategies for an AI video platform startup. Developing onboarding processes and content for customer success in a dynamic environment.
Solutions Engineer designing and delivering AI - powered solutions for industry leaders in engineering. Collaborating with teams to streamline AI adoption and improve product evolution.
Dynamics 365 Finance Architect at Avanade driving critical ERP transformations and integration across functional areas. Collaborating with stakeholders to ensure client needs are met and processes aligned.
Solution Architect designing, developing, and implementing solutions for complex business challenges. Collaborating with stakeholders and managing risk to deliver robust, scalable architectures.