Hybrid Software Engineer, Data Acquisition

Posted last week

Apply now

About the role

  • Develop and maintain web crawlers using Python libraries such as Beautiful Soup to extract data from target websites.
  • Utilize headless browsing techniques, such as Chrome DevTools, to automate and optimize data collection processes.
  • Collaborate with cross-functional teams to identify, scrape, and integrate data from APIs to support business objectives.
  • Create and implement efficient parsing patterns using regular expressions, XPaths, and CSS selectors to ensure accurate data extraction.
  • Design and manage distributed job queues using technologies such as Redis, Kubernetes, and Postgres to handle large-scale data processing tasks.
  • Develop strategies to monitor and ensure data quality, accuracy, and integrity throughout the crawling and indexing process.
  • Continuously improve and optimize existing web crawling infrastructure to maximize efficiency and adapt to new challenges.

Requirements

  • Proficiency in Python, Java, or C++
  • Strong understanding of HTTP/HTTPS protocols and web communication.
  • Knowledge of HTML, CSS, and JavaScript for parsing and navigating web content.
  • Mastery of queues, stacks, hash maps, and other data structures for efficient data handling.
  • Ability to design and optimize algorithms for large-scale web crawling.
  • Hands-on experience with web scraping libraries/frameworks (e.g., Scrapy, BeautifulSoup, Selenium, Playwright).
  • Understanding of how search engines work and best practices for web crawling optimization.
  • Experience with SQL and/or NoSQL databases (e.g., PostgreSQL, MongoDB) for storing and managing crawled data.
  • Familiarity with data warehousing and scalable storage solutions.
  • Knowledge of distributed systems (e.g., Hadoop, Spark) for processing large datasets.
  • Proficiency in Pandas, NumPy, and Matplotlib for analyzing and visualizing scraped data.
  • Experience applying Machine Learning to improve crawling efficiency or accuracy.
  • Familiarity with cloud platforms (AWS, GCP) and containerization (Docker) for deployment.

Benefits

  • 💰 Competitive salary and equity
  • 🧑‍⚕️ Health insurance
  • 🚴 Transportation allowance
  • 🥎 Sport allowance
  • 🥕 Meal vouchers
  • 💰 Private pension plan
  • 🍼 Parental : Generous parental leave policy
  • 🌎 Visa sponsorship

Job title

Software Engineer, Data Acquisition

Job type

Experience level

Mid levelSenior

Salary

Not specified

Degree requirement

Bachelor's Degree

Location requirements

Report this job

See something inaccurate? Let us know and we'll update the listing.

Report job