AI Data Pipeline Engineer designing and operating high-throughput systems for petabyte-scale data delivery. Collaborating across teams to ensure data flows into AI workloads efficiently.
Responsibilities
Design and build high-performance, scalable data pipelines to support diverse AI and Machine Learning initiatives across the organization.
Architect and implement multi-region data infrastructure to ensure global data availability and seamless synchronization.
Develop flexible pipeline architectures that allow for complex branching and logic isolation to support multiple concurrent AI projects.
Optimize large-scale data processing workloads using Databricks and Spark to maximize throughput and minimize processing costs.
Maintain and evolve the containerized data environment on Kubernetes, ensuring robust and reliable execution of data workloads.
Collaborate with AI researchers and platform teams to streamline the flow of high-quality data into training and evaluation pipelines.
Requirements
Extensive professional experience in building and operating production-grade data pipelines for massive-scale AI/ML datasets.
Strong proficiency in distributed processing frameworks, particularly Apache Spark and the Databricks ecosystem.
Deep hands-on experience with workflow orchestration tools like Apache Airflow for managing complex dependency graphs.
Solid understanding of Kubernetes and containerization for deploying and scaling data processing components.
Proficiency in distributed messaging systems such as Apache Kafka for high-throughput data ingestion and event-driven architectures.
Expert-level programming skills in Python for system-level optimizations.
Strong knowledge of cloud-native services and best practices for building secure and scalable data infrastructure.
Logical approach to problem-solving with the persistence to identify and resolve root causes in complex, large-scale systems.
Strong communication skills to effectively collaborate with cross-functional teams and external partners.
Benefits
이력서 제출 시 주민등록번호, 가족관계, 혼인 여부, 연봉, 사진, 신체조건, 출신 지역 등 채용절차법상 요구 금지된 정보는 제외 부탁드립니다.
모든 제출 파일은 30MB 이하의 PDF 양식으로 업로드를 부탁드립니다. (이력서 업로드 중 문제가 발생한다면 지원하시고자 하는 포지션의 URL과 함께 이력서를 [email protected]으로 전송 부탁드립니다.)
SAP Specialist responsible for designing, developing, and executing data migration objects in Hydro’s SAPEX program. Ensuring successful ETL processes and maintaining data quality.
Senior Data Engineer building scalable data pipelines and data models within retail at Avaron. Collaborating closely with business and technical teams to ensure reliable data solutions.
Senior Data Engineer building and operating the data platform at bsport. Collaborating with the Data team to optimize data intake and accessibility for analytics and AI.
Data Engineer building and maintaining Azure data platforms for Hultafors Group's analytics and reporting needs. Collaborating across various business functions in a cloud environment.
Lead Data Pipeline Manager at Valpak, overseeing data pipelines for environmental compliance initiatives. Collaborate with teams to ensure data quality and operational performance.
Data Engineer role responsible for building scalable data pipelines and systems at Consort Group in Portugal. Involves data engineering and regulatory reporting across diverse technical environments.
Senior Data Engineer with strong ETL experience using IBM DataStage for data transformations. Collaborating on data quality and design improvements in a hybrid environment.
Data Engineer supporting data capabilities of the fastest - growing family law firm in the UK. Drive data initiatives and collaborate with the BI team in a hybrid working environment.
Data Architect designing AWS - based data solutions in hybrid role with a major financial sector client in LATAM. Involves leading data architecture decisions and optimizing data processing solutions.