Skip to main content

Metadata crosswalk for citation data production in OpenCitations

This workflow outlines the process of generating citation data within the OpenCitations infrastructure ( It encompasses several steps, commencing with a metadata crosswalk starting from the data provided by a specific source. This is followed by data curation and validation, which ensures the data's quality and adherence to standards. Subsequently, the data is adjusted to conform to the OpenCitations Data Model, and new citation data is created, ready for integration into OpenCitations and dissemination via its services.

The workflow comprises six sequential steps, each representing a critical phase in the data transformation process. These steps not only provide a theoretical framework for understanding the process but also offer practical guidance on utilizing OpenCitations' available open-source software tools and datasets. The steps are presented in a logical order, with clearly defined input and output data for each stage.

It's worth noting that all the software tools referenced in this workflow are open source and freely accessible, while the data used and generated throughout the workflow is governed by a CC0 waiver license, ensuring its unrestricted availability and use.

The ideal target audience of this workflow is a user directly involved in the OpenCitations infrastructure development activities, who could use a standard approach to produce structured citation data ready to be ingested in OpenCitations starting from a new data source. However, the workflow can also be exploited by external researchers too, once replicating the infrastructure on personal devices.


Workflow steps(5)

  1. 1 Data source selection

  2. 2 Development of a software plug-in for data conversion

  3. 3 Production of metadata and citation data collections

  4. 4 Ingestion of metadata collection in OpenCitations META

  5. 5 Production of OpenCitations citation data

European Union flag

The SSH Open Marketplace is maintained and will be further developed by three European Research Infrastructures - DARIAH, CLARIN and CESSDA - and their national partners. It was developed as part of the "Social Sciences and Humanities Open Cloud" SSHOC project, European Union's Horizon 2020 project call H2020-INFRAEOSC-04-2018, grant agreement #823782.