Technical aspects
Data model
The data model of the SSH Open Marketplace was designed to be generic and flexible enough to support a variety of sources. The data model v1.5 is presented in the Marketplace – Implementation report.
Data ingestion workflow
The SSH Open Marketplace provides a REST API at the backend where data can not only be read but also inserted or updated. This approach allows technology-agnostic ingestion. During the development phase, two main ingestion pipelines - covering the gathering of the data at the source (e.g. via an API or a Git repository), the mapping of the data to the SSH Open Marketplace data model including transformation of parts of the data (e.g. connecting terms from the source to a vocabulary like TaDiRAH) and the integration of the processed data into the SSH Open Marketplace via the API - were used:
- PoolParty UnifiedViews provided by Semantic Web Company (SWC).
- Data Aggregation and proCessing Engine (DACE) developed by the Poznan Supercomputing and Networking Center (PSNC)
Deployment of the website & configuration
The development of the SSH Open Marketplace is managed via GitHub. If you want to know more, pay a visit to our GitHub organization: https://github.com/SSHOC
The system consists of multiple components deployed as separate containers in a dockerized environment:
- backend component consists of the core application implemented in Java and
- a Solr index and a PostgreSQL database deployed as separate containers
- frontend component
The system is deployed in three instances:
- development - used by developers for testing new functionality
- staging - instance based on the latest version of the code and data model
- production - stable instance publicly available at marketplace.sshopencloud.eu
A continuous integration and deployment setup is in place, where pushing changes to corresponding branches in the git repository trigger a set of tests to be run and upon successful completion a new version of the application is automatically deployed.
There are a number of dynamic parts of the data model, which require configuration or initial population upon installation of a new naked instance. These “system data” are maintained as YAML-configuration files as part of the backend source code and are applied during initialisation of the system. This includes:
- Dynamic properties or property-types, e.g. activity-type, keyword
- Actor roles, e.g. contributor, author, funder
- Item relation types, e.g. is-documented-by, mentions
- Concept relation types, e.g. related, broader, sameAs
- Sources, e.g. TAPoR, Programming Historian
- Actor sources, e.g. ORCID, DBLP, Wikidata
Vocabularies, though also part of the system data, are not loaded automatically on application startup and need to be ingested separately. Similarly, if changes to the system data are needed for an existing system with a populated database, corresponding API calls can be applied by the Administrator.
To ensure data consistency and prevent data loss due to evolution of the data model, the Liquibase tool is used to manage the database schema migrations. Corresponding configuration files for the schema migration are also maintained as YAML files and are executed only once during application startup, before loading of the system data.
The SSH Open Marketplace is maintained and will be further developed by three European Research Infrastructures - DARIAH, CLARIN and CESSDA - and their national partners. It was developed as part of the "Social Sciences and Humanities Open Cloud" SSHOC project, European Union's Horizon 2020 project call H2020-INFRAEOSC-04-2018, grant agreement #823782.