Skip to main content

About the data population

During the development phase of the SSH Open Marketplace, we identified and prioritised trusted sources from which to gather information to populate the SSH Open Marketplace. Over the 3 years of the SSHOC project, 15 sources have been ingested for a total of over 5,000 individual items. The sources are listed below.

TAPoR logo

TAPoR is a long-standing gateway to tools used for text analysis and retrieval.

Programming Historian logo

The Programming Historian publishes novice-friendly, peer-reviewed tutorials that help humanists learn a wide range of digital tools, techniques, and workflows to facilitate research and teaching.

Standardization Survival Kit logo

The Standardisation Survival Kit presents a collection of research use case scenarios illustrating best practices in Digital Humanities and Heritage research.

Language Resource Switchboard logo

The Language Resource Switchboard helps you find tools that can process your data.

dblp computer science bibliography logo

The dblp computer science bibliography provides open bibliographic information on major computer science journals and proceedings. Only a subset of publications related to digital humanities is being ingested in the SSH Open Marketplace.

EOSC logo

The EOSC Portal catalogue & marketplace is an integrated platform that allows easy access to plenty of services and resources for various research domains along with integrated data analytics tools. Only a subset of resources, relevant for Social Sciences and Humanities, is being ingested in the SSH Open Marketplace.

Humanities Data

The Humanities Data website collects and presents datasets and recipes stemming from Digital Humanities projects.

CESSDA Training

The CESSDA Training Working Group, implementing one of the four strategic pillars of CESSDA, offers a wide variety of training in Research Data Management and data archiving to both researchers and data curators.

CLARIN Resource Families

The CLARIN Resource Families are a number of curated collections of corpora and tools. They are manually put together by CLARIN with the aim to provide a user-friendly overview of existing resources within and without the CLARIN infrastructure.

DARIAH-Campus

DARIAH-Campus is both a discovery framework and a hosting platform for DARIAH and DARIAH-affiliated training and education materials.

DARIAH Contribution Tool

DARIAH member states contribute to the DARIAH distributed infrastructure with a diverse range of resources and services, and declare these in-kind contributions via the DARIAH contribution tool. A selected set of these contributions has been added to the SSH Open Marketplace.

SSHOC logo

The SSHOC service catalogue is the result of all SSHOC Work Packages to collect and consolidate the SSHOC services (or resources) offered. Based on the progressing implementation of the project, resources referenced in the catalogue are the most visible outputs of the SSHOC project.

SSHOC logo

Training materials produced as part of the SSHOC training events.

SSHOC Conversion Hub

The SSH Conversion Hub is an outcome of SSHOC WP3. It allows users to search for tools that convert from one (meta-)data/file format to another one, e.g. from CSV (comma-separated values) to TEI (Text Encoding Initiative). import withHeadingIds from ‘remark-slug’

SSH Training Discovery Toolkit

The SSH Training Discovery Toolkit acts as an overview on relevant sources that hold (digital) material for trainers. For such sources, selected items are described in more detail than is useful for the SSH community and give a hint about what training material to expect from the source.

If you don’t see some of these sources in the Marketplace, it is because they are still in the process of being ingested. New sources will also be regularly added. Between 1 to 4 sources per year are expected starting mid-2022. As there are a few criteria to comply with, please check the Contribute section if you would like to suggest new sources.

#

Types of content

The conceptual approach used to structure and classify the content of the SSH Open Marketplace led to the identification of 5 main content types.

Tool or service

#

Tools & services

Materials or products used to perform activities such as:

  • Desktop clients solutions (to be installed locally)
  • Browser-based or command-line based resources
  • Mobile apps
  • Programming libraries or APIs
  • Data catalogues

Example: Gephi is a visualisation and exploration software for all kinds of graphs and networks.

Training Material

#

Training materials

Tutorials, lessons or didactic resources explaining how to perform an action or highlighting the learning outcomes one would gain from engaging with the material.

Example: The lesson “Beginner’s Guide to Twitter Data” (Programming Historian website).

Workflow

#

Workflows

Sequences of operation/steps performed on research data during their lifecycle. Workflows can be achieved by using diverse tools and facilities, and useful resources are connected to each step.

Example: “Extract textual content from images” is a workflow composed of 13 steps coming from the Standardization Survival Kit.

Dataset

Single digital objects or collections of data, records, or information that is kept as a persistent unit of information in the knowledge generation process. Datasets are used as evidence for some phenomena.

Example: Parlamint 2.0

Publication

#

Publications

Research results published in academic journals or repositories. The SSH Open Marketplace references only publications that can be connected to other resources and is not an exhaustive collection of papers.

Example: “PoetryLab. An Open Source Toolkit for the Analysis of Spanish Poetry Corpora” is a Conference Paper presented during the DH2020 Conference in which you can find an example of use of the SpaCy library (referenced as a tool in the SSH Open Marketplace).

#

Inclusion criteria

In order to ensure high quality data in the SSH Open Marketplace, we have established inclusion criteria for entry into the database. These apply to both individual items, as well as mass ingestions of entire sources (more on that below).

The Marketplace comes with a good coverage of resources relevant to academics, scholars and students from the Social Sciences and Humanities but research is dynamic and you may point us to new and upcoming items, or even entire collections of resources. You may suggest such collections as new sources to the Marketplace. import withHeadingIds from ‘remark-slug’ An item or source has to match the following selection criteria:

  • The relevance of the data: Above all, the SSH Open Marketplace serves researchers in the social sciences and the humanities, therefore, there must be an established link between the content added into the Marketplace and the SSH. The broad scope of the SSH Open Marketplace means that to be selected, any resource must fulfill at least two criteria: (1) relevance for SSH research and researchers and (2) pertinence to the digital methodologies used within the SSH landscape.
  • From a technical point of view, the data must also be current, supported, and ideally open. The SSH Open Marketplace favours the uptake of Open Science workflows and open research practices. Software resources are preferably built upon open source solutions. Note: Given that the SSH Open Marketplace seeks to mirror actual research practices, commercial or non-current resources are also referenced where these are relevant for the scientific community.
  • Contextualization is one of the key pillars of the SSH Open Marketplace. It is meant to provide a discovery portal for tools and services, while placing these tools and services in context via publications, training materials, datasets, and workflows. As such, these last four categories are indexed in the SSH Open Marketplace insofar as they can be placed in relation with tools and services, and if no relation to a tool or service exists, they should not be accepted.
  • the quality of the (meta)data: higher quality data will increase the quality of the Marketplace; this means that the more of the metadata fields in the Marketplace data model are covered by an item, the better (items with just a title and a link are inferior to ones containing a long description and maybe even a list of contributors) and the more good quality metadata in your item or source , the higher the overall quality of the item or source.
  • the uniqueness of the data: if the data is already in the Marketplace, there is no need to add it again, either as an individual item or with a source. Even though duplicate entries are dealt with within the Marketplace, ingesting identical (or worse almost identical) entries multiple times uses up valuable human resources that will be better put to use elsewhere considering the limited timeframe and personnel of the project
  • the expected usefulness for the SSH community: are the items in this source possibly of use to SSH researchers; it could be that they are widely used already or that they offer new and interesting possibilities
  • the representativity of the source for the SSH domain: with the various partners in the SSHOC project coming from different parts of the wide field of SSH (e.g. CESSDA being mostly concerned with social sciences, while CLARIN is more linguistics/language focused), considerable effort was made to find sources that represent all the different facets of the SSH landscape; it also made sense to prioritize sources coming from the SSHOC project itself, assuming that they will have this kind of representativity baked in already.
  • Finally, in terms of legal and ethical requirements, the SSH Open Marketplace maximises the findability and re-use of data in line with the FAIR data principles – Findable, Accessible, Interoperable and Re-usable – for research data, and guides users towards tools, services or training materials that can help them in their FAIRification of workflows. Of necessity, the SSH Open Marketplace is GDPR compliant. This is reflected in the manner of presentation of the resources and – more obviously – in the management of users.

Additionally, when suggesting a source for the Marketplace, please keep in mind the following additional criteria:

  • the technical interface: it is easier to import a source into the Marketplace if it offers a well-documented API; considering the limited human resources of the project, sources which needed a lot of additional work before they could be fed into the ingestion pipeline were placed very low on the priority list
  • It was decided to focus on primary sources and exclude sources that only aggregate information that can be found elsewhere; such “2nd hand metadata” introduces additional points of failure and using the primary source instead will likely result in higher quality metadata

We believe that accurate and up-to-date content is key to making the SSH Open Marketplace a rich and useful discovery portal, data population and curation are at the heart of our work. Once (meta)data have been ingested in the Marketplace, we curate and enrich them thanks to a hybrid approach: automated checks are run on the ingested data, followed by manual review of the identified problems as well as of aspects that cannot be checked automatically. In this process, contributors and moderators are playing an essential role (See “How to contribute” below).

For more questions, please contact us.

European Union flag

The SSH Open Marketplace is maintained and will be further developed by three European Research Infrastructures - DARIAH, CLARIN and CESSDA - and their national partners. It was developed as part of the "Social Sciences and Humanities Open Cloud" SSHOC project, European Union's Horizon 2020 project call H2020-INFRAEOSC-04-2018, grant agreement #823782.

CESSDACLARINDARIAH-EU