Data Scientist at Stefanini shaping LLM customization via data pipelines and sources. Engaging in data structuring, quality assurance, and efficient storage practices.
Responsibilities
Design and implement data pipelines to support the LLM customization process
Collect, process, and structure diverse data sources
Develop scripts and processes for extracting structured and unstructured data
Implement transformations to convert raw data into formats suitable for training
Ensure the quality, consistency, and relevance of the data used for training
Create mechanisms for validation and testing of datasets
Develop processes for data enrichment
Implement efficient storage for data and training results
Configure data integration between the trained model and the Elastic platform
Document data architecture, flows, and transformations
Implement data versioning and traceability practices
Optimize data flow for model training iterations
Ensure security and compliance in the handling of data used
Requirements
Additional courses in natural language processing or data preparation for ML (desirable)
Practical knowledge of the Elastic Stack platform (Elasticsearch, Logstash, Kibana) | Level: Advanced (Required)
Experience preparing datasets for training language models | Level: Advanced (Required)
Experience with extraction, transformation, and loading (ETL) of unstructured data | Level: Advanced (Required)
Benefits
Meal allowance or food voucher
Discounts on courses, universities, and language schools
Stefanini Academy — a platform with free, up-to-date online courses and certificates
Mentoring
Benefits club for consultations and medical exams
Health insurance
Dental insurance
Employee discounts and benefits at top establishments
Data Science Intern leveraging skills in statistics and programming for data - driven projects at Revolve. Collaborating with marketing and operations for insightful analytics and strategy recommendations.
Senior AI & Data Scientist leading advanced analytical model development for tech company. Ensuring robust data architecture and deploying AI solutions from analysis to production.
Senior Data Scientist responsible for enhancing mapping and routing solutions at Vay. Utilizing data science and machine learning to improve navigation in urban mobility systems.
Data Scientist transforming complex data into valuable insights for adp MERKUR GmbH. Develops machine - learning models and collaborates with stakeholders for data - driven decisions.
Senior Real World Data Scientist providing statistical and epidemiological expertise in nutrition and health research. Analyzing large scale observational data and guiding project teams in statistical methodologies.
Senior Data Scientist at LexisNexis Risk Solutions applying machine learning and AI for government services. Collaborating with a diverse team to solve meaningful problems.
Clinical Data Manager supporting the biospecimen and clinical data management for WVU Cancer Institute. Engaging in data collection, management, and reporting for research and quality assurance.
Data Manager coordinating AI development processes to govern and manage data for medical devices at GE HealthCare. Collaborating with AI/ML engineers and tracking compliance and readiness throughout the lifecycle.
SMAI Manager leading a talented team to develop advanced factory scheduling and automation solutions at Micron. Collaborating globally to enhance manufacturing capabilities and deliver best practices.