About the role

Data Scientist developing AI systems for Parspec’s platform transforming construction materials supply chain. Analyzing unstructured datasets and building scalable data pipelines for enhanced recommendations.

Responsibilities

Work with large, messy, and unstructured datasets, transforming them into structured formats that improve model performance
Gather, clean, and structure datasets from multiple sources to support AI training pipelines
Build systems that enable the team to maintain high-quality datasets and drive model accuracy over time
Analyze complex datasets to identify patterns, trends, and insights that improve product discovery and recommendations
Design and build end-to-end data pipelines and machine learning pipelines
Develop scalable systems capable of handling large volumes of construction product data
Work with engineering teams to deploy models into production systems
Work closely with product managers, designers, and engineers to integrate AI capabilities into Parspec’s applications
Own product features from concept through implementation and deployment
Communicate technical findings through visualizations, reports, and dashboards that support product and business decisions
Stay current with the latest developments in machine learning, deep learning, and AI frameworks
Contribute to Parspec’s broader AI research and development initiatives
Uphold Parspec’s culture of engineering excellence through high-quality code and thoughtful system design

Bachelor’s or Master’s degree in Computer Science, Data Science, Engineering, Statistics, or a related technical field
Strong conceptual understanding of machine learning, statistics, and data modeling
Proven experience working with large, messy, or unstructured datasets, including cleaning, structuring, and transforming data for analysis and model training
Strong programming skills in Python and SQL (R is a plus)
Experience implementing machine learning projects using libraries such as NumPy, Pandas, scikit-learn, and Matplotlib
Experience training deep learning models using TensorFlow/Keras or PyTorch
Strong knowledge of statistics, probability, and machine learning techniques such as regression, classification, and clustering
Ability to write clean, efficient, and maintainable code
Ability to take ownership of projects and drive initiatives from concept through production deployment
Experience building data pipelines and ML pipelines for production systems (preferred)
Familiarity with AWS infrastructure and Django-based applications (preferred)
Experience working with OCR pipelines, document processing, or PDF data extraction (preferred)
Familiarity with data visualization tools such as Tableau or Power BI (preferred)
Strong knowledge of algorithms and data structures (preferred)
Participation in Kaggle competitions, competitive programming, or open-source contributions (preferred)
Experience collaborating with distributed teams across multiple time zones (preferred)

Competitive salary and benefits, including family insurance coverage, free health teleconsultations, and learning/upskilling budgets
Equity in the company
Flexible hours and a hybrid work setup
Unlimited PTO
Opportunity to grow with a fast-scaling company transforming a large market