Skip to main content

A workflow to publish Collections as Data: the case of Cultural Heritage data spaces

This is version 1 of the workflow, currently under community review for version 2.

Cultural Heritage institutions have been making their digital collections available for the public for several decades. Advances in technology such as Artificial Intelligence and Machine Learning have provided a new context in which digital collections can be analysed using computational methods. Initiatives such as Collections as data and the FAIR data principles, have emerged to provide best practices and guidelines for publishing digital collections suitable for computational use. These initiatives are complemented with the CARE principles to strengthen ethical considerations in data governance and reuse. In parallel, experimental Labs have been implemented in Galleries, Libraries, Archives and Museums (GLAM) in order to reuse the digital collections.

Data spaces have emerged as an innovative way to publish and reuse digital collections. Based on previous work undertaken in the context of the GLAM Labs Community, this workflow provides a set of steps to publish Collections as data. It aims to guide and encourage cultural heritage institutions, step by step, in publishing their collections, so that they are suitable for computational use. It is important to note that the checklist items are iterative in nature and there is no particular order to apply them. Each institution can choose which items to use. Priorities depend on the purpose, the context, the content, the intended use or target users of the dataset.

This workflow has been developed in the context of the common European data space for cultural heritage. It builds on the International GLAM Labs’ Checklist to Publish Collections as Data in GLAM Institutions.


Related items(1)

Workflow steps(10)

  1. 1 Provide a clear license and terms of use allowing reuse of the dataset without restrictions

  2. 2 Provide a suggested citation for the dataset so reusers are aware of how to cite it

  3. 3 Include documentation about the dataset

  4. 4 Use a public platform to make available the dataset for the public

  5. 5 Share examples of use to demonstrate how the dataset can be reused

  6. 6 Think about a structure for the dataset for a better understanding of how to reuse the content

  7. 7 Include machine-readable metadata about the content provided in the dataset

  8. 8 Use an existing collaborative-edition platform to include the information about the dataset

  9. 9 Provide the dataset by means of an existing API

  10. 10 Create a website to present and describe the dataset to encourage its reuse

European Union flag

The SSH Open Marketplace is maintained and will be further developed by three European Research Infrastructures - DARIAH, CLARIN and CESSDA - and their national partners. It was developed as part of the "Social Sciences and Humanities Open Cloud" SSHOC project, European Union's Horizon 2020 project call H2020-INFRAEOSC-04-2018, grant agreement #823782.