
Using GoTriple data for your SSH data science tasks
This workflow explains how to use GoTriple data on Social Sciences and Humanities (SSH) for data analytics tasks.
GoTriple is the discovery platform for the SSH: it currently indexes metadata of almost 19,5 million documents, over 25,000 projects and about 22,5 million author profiles. The extreme diversity of SSH is well represented, in its multivaried aspects, in the platform: on the one hand, GoTriple’s documents cover 27 disciplines within SSH, from History to Literature, from Management to Gender Studies.
Moreover GoTriple is inherently multilingual, indexing documents in over 20 languages but also providing its user interface localised in 10 European languages. Finally, diversity also applies to the nature of its documents, with a high representation of textual content (articles, thesis, reports, books,...) but also a significant amount of datasets, multimedia assets and software artefacts with their number expected to increase in the coming months.
This workflow explains in practice, with code examples, how GoTriple data can be used by Digital Humanists for data analytics tasks.
There are two ways to do so:
- by using the data dumps created and published in Zenodo (last update May 18, 2025);
- by using GoTriple APIs, to extract data from the online platform.
What to use: If you just need a subset of GoTriple publications that have a link to their PDF, the data dumps are the quickest way to get you started. Also this is the suggested way if you don’t want to write code to get GoTriple data.
On the other hand, use the APIs if: a) you need access to the most recent version of the data; b) you need to include publications without a full-text; c) if you need access to a very specific subset of the data and you want to apply custom filtering conditions.
Workflow Code A Jupyter Notebook with ready-to-use examples is available and it is the base for this workflow [go‑triple_stats‑with‑zenodo.ipynb[(https://github.com/odoma-ch/gotriple-data-utils/blob/main/gotriple_stats-with-zenodo.ipynb). This Jupyter notebook provides hands-on examples of how to process GoTriple data dumps and integrate analytics workflows via Python or similar tooling. Familiarize yourself with its structure, required dependencies (e.g., pandas, requests), and typical processing pipelines (such as metadata loading, parsing, filtering, aggregation, and visualization). Also, familiarize yourself with GoTriple data model, which is fully described in the TRIPLE Project deliverable D2.5: https://zenodo.org/records/7359654.
Workflow steps(6)
1 Using data dumps: 1. Locate and Download Data Dumps
2 Using data dumps: 2. Loading and preparing data
3 Using data dumps: 3. Process data
4 Using APIs: 1. Getting familiar with GoTriple APIs
5 Using APIs: 2. Process & Analyze data
6 Combined Use
The SSH Open Marketplace is maintained and will be further developed by three European Research Infrastructures - DARIAH, CLARIN and CESSDA - and their national partners. It was developed as part of the "Social Sciences and Humanities Open Cloud" SSHOC project, European Union's Horizon 2020 project call H2020-INFRAEOSC-04-2018, grant agreement #823782.


