Data Scientist creating scalable insights from unstructured data at AI safety company. Collaborating with engineering and research teams in a hybrid Paris location.
Responsibilities
Turn petabytes of unstructured text into a structured, explorable view (topics, clusters, segments, trends, anomalies): iterate from “unknown unknowns” to stable definitions we can track.
Build scalable representation pipelines: sampling strategies, preprocessing/normalization, embeddings at scale, indexing, and retrieval to make the corpus searchable and analyzable.
Use LLMs pragmatically: labeling/classification, weak supervision, data enrichment, summarization, and automated diagnostics of inbound volumes (with cost/quality controls).
Deliver insights that change decisions: translate findings into product and operational actions (what data we have, what’s missing, where quality breaks, what to prioritize next).
Ship self-serve analytics: datasets, data models, and lightweight tools/dashboards so the team can explore and answer questions without ad-hoc requests.
Partner closely with engineering/research: align pipelines with production constraints (latency/cost/privacy), and integrate outputs into workflows.
Requirements
Strong Python + SQL with an engineering mindset: you can build reliable pipelines, not just notebooks.
Solid applied NLP/ML experience on real-world text: embeddings, clustering, topic modeling, semantic search, classification; you understand failure modes and how to debug them.
Comfortable at scale: distributed processing, large-scale storage-querying, and performance-cost tradeoffs.
You know how to evaluate fuzzy problems: offline/online metrics, human-in-the-loop labelling, inter-annotator agreement, drift monitoring, and reproducibility.
Prior work with safety/moderation datasets, policy/rule systems, or high-volume logging/observability
Benefits
20 days of paid vacation
Work from Paris (hybrid) + relocation package
Best medical insurance in France
All the hardware, tools, and services you need
Covered subscriptions for AI agents and IDEs
Team off-sites twice a year: we’ve recently been to the Alps and to Saint-Tropez
Analytics Engineer transforming raw data into actionable insights to support decision - making in data marketing. Working with various data technologies in a hybrid internship role in Paris.
Senior Analytics Engineer developing data pipelines and analytics capabilities for a data - driven analytics company. Collaborating with stakeholders to build trustworthy datasets for impactful decision - making.
Analytics Engineer driving the architecture and scalability of data models and pipelines at Preply. Collaborating across teams to empower data - driven decision - making in education tech.
Analytics Engineer for Customer Product Analytics team at Just Eat Takeaway.com. Enhancing user experience by delivering data - driven insights and optimizing product experience in a hybrid role.
Analytics Engineer transforming raw data into organized datasets. Collaborating with business teams to ensure data quality and governance in Azure environment.
Data Engineer creating scalable data pipelines for Capgemini's analytics solutions using modern ETL tools. Involves data storage and management using platforms like Snowflake and Redshift.
Analytics Engineer blending data analysis, business intelligence, and data engineering for a healthcare software company. Creating dashboards and data models to empower decision - making and improve healthcare outcomes.
Lead Web Analytics Developer architecture and implementation of web analytics tools at Hostinger. Collaborate with teams to enhance user experience and improve conversion rates.
Principal managing analytics engineering projects for private equity, leveraging AI technology. Leading cross - functional teams and contributing to business development in a hybrid work environment.
Prototyping analytical tools developed by the engineering and analytics teams for Clir Renewables. Collaborating on automation scripts and methods for renewable energy analysts.