Skip to main content
Tool or service

Scrapy

Scrapy is an open source programming library for web crawling and web page text extraction, written in Python. You can make calls to Scrapy code from within your own scripts and applications to automate the task of extracting data from websites.

You would typically use Scrapy to automate the task of visiting one or more web pages, on a website to which you have access. You could alternately use it to invoke web-based Application Programming Interfaces (APIs).

Scrapy can download everything it encounters or selected information. One of its capabilities is being able to extract structured data from web pages, such as one or more individual pieces of text from specific locations on the page, or all of the data in a table.

Media

European Union flag

The SSH Open Marketplace is maintained and will be further developed by three European Research Infrastructures - DARIAH, CLARIN and CESSDA - and their national partners. It was developed as part of the "Social Sciences and Humanities Open Cloud" SSHOC project, European Union's Horizon 2020 project call H2020-INFRAEOSC-04-2018, grant agreement #823782.

CESSDACLARINDARIAH-EU