Lead Data Scientist developing digital twins and generative AI predictions for Caterpillar's digital applications. Collaborating across teams to create strong, automated diagnostic workflows and real-time monitoring solutions.
Responsibilities
Algorithm Development & Modeling
Anomaly Detection: Design and implement GPU-accelerated machine learning models (e.g., XGBoost, autoencoders, and GANs) to identify irregular patterns in high-frequency sensor data.
Digital Twin Engineering: Partner with engineering teams to develop onboard digital twins using NVIDIA architecture to simulate, predict, and optimize the performance of heavy machinery
Optimization: Profile and tune deep learning algorithms for maximum efficiency on NVIDIA GPU architectures, ensuring high throughput and low latency for real-time monitoring.
Testing onboard Architecture & Integration
Edge Deployment: Adapt and test algorithms for onboard architecture, leveraging tools like NVIDIA Jetson and real-time edge processing on Cat equipment.
Hardware-Software Co-Design: Collaborate with hardware / simulation engineers to ensure algorithm compatibility with next-generation processors and specialized onboard compute modules.
Simulation-Based Training: Use high-fidelity digital twins to simulate rare failure scenarios, ensuring the GenAI assistant provides accurate troubleshooting steps for edge-case mechanical issues.
GenAI Algorithm Automated Diagnostic Workflows: Develop Generative AI agents that synthesize telematics data to generate prioritized repairs for identified machine faults.
Unified Data Orchestration: Integrate multi-modal outputs from condition monitoring analytics & asset life history to create a machine-specific context for AI assistant.
Requirements
Typically, a Bachelors, Masters, or PhD degree in Applied Statistics, Data Science, Business Analytics, Predictive Analytics, Business Intelligence & Analytics, Mathematics, Computer Science, Engineering (Aerospace, Electrical, Mechanical, Computer, Industrial, Agricultural, etc.), or equivalent technical degree
Extensive experience applying Python (NumPy, SciPy, pandas, etc.) programming to solve business challenges.
Extensive experience with advanced data analysis, machine learning such as clustering, Log regressions, neural nets, and statistical methods such as statistical process control, etc. (typically 8+ years)
Experience in practical applications of onboard architecture / software (e.g. mini projects using Raspberry Pi or any other architecture is a bonus)
Working experience with heavy equipment engineering or data analysis.
Working knowledge with cloud technologies (AWS, Azure, Google Cloud, etc.)
Advanced experience with version control / repositories such as GitHub
Experience operating in an Agile environment
Must demonstrate strong initiative, interpersonal skills, and the ability to communicate effectively.
Benefits
Medical, dental, and vision benefits*
Paid time off plan (Vacation, Holidays, Volunteer, etc.)*
Lead design and build of scalable ETL pipelines and data models for Workday Prism. Ensure security and operational management of data while supporting HR and Finance stakeholders.
Clinical Data Science Associate analyzing and interpreting clinical trial data at ICON. Contributing to data quality and informed decision - making in research efforts.
Data Scientist designing and deploying AI agents and copilots for clients' business needs. Developing scalable solutions with Microsoft technologies and leading AI initiatives in a collaborative environment.
Expert geospatial role focused on GeoAI and data science for critical infrastructure projects. Develops frameworks, integrates data, and applies advanced technologies in a hybrid work environment.
Data Scientist II supporting AI strategy at Alberta Blue Cross with responsibilities in data analysis and machine learning capabilities. Collaborate with stakeholders to drive advanced analytics adoption within the business.
Lead clinical data science initiatives at ICON, a leading global healthcare intelligence organization. Focus on data review activities and manage timelines for clinical studies.
Senior Data Scientist focusing on model risk management, validation, and AI applications. Contributing to financial protection solutions with a multinational team.
Data Lead ensuring data integrity and efficiency for data flows in a SaaS company. Responsible for monitoring, validating, and auditing data processes to enhance quality and timeliness.
Data Migration & Master Data Manager overseeing data migration activities for complex IT projects at Cypher Consulting Europe. Responsible for Master Data Management and ensuring data quality and integrity.
Director of Data Science & Analytics managing data strategy and team for Hinge's product. Shaping product innovation through data - driven insights and experimentation with cross - functional collaboration.