TextGrid - Import Workflow for Data aka The Fluffy Import
How do I import textual research data in the TextGrid Repository?
Researchers repeatedly articulate their need for well-usable tools and infrastructures for the easy creation and publication of text-based research data. This also applies to Text+ and is exemplified by the Text+ User Stories. The partners involved in Text+, including SUB Göttingen, GWDG, and TU Dresden, contribute to this—for example, with the new developments tested at the 2024 Code Sprint for Humanities Data.
Thanks to its extensive, searchable, and reusable collection of texts and images, TextGridRep is an important and relevant repository for humanities research data.
For researchers, it offers a sustainable, permanent, and secure means of citable publication of their research data and a clear description of the same through necessary and extended metadata so that the publication in TextGridRep can be presented and reused. Results were achieved on various levels, which were presented and used on the occasion of the Code Sprint. TextGridRep was further developed in terms of authentication, the presentation of unpublished projects, and data intake. A new tool was developed for generating the required TextGrid metadata files, which detects the metadata in the teiHeader elements of the input files and also allows the manual setting of values and rules for their discovery. A web-based tool enabling convenient workflow operation was implemented as a Jupyter Notebook. For easy use of this tool, the Text+ JupyterHub is available. It was also adapted on this occasion so that the mentioned tools are already installed and can be used without further ado.
This workflow describes the new fluffy import in the TextGrid Repository.
Workflow steps(5)
1 The TextGrid Repository Documentation
2 TextGrid Import UI
3 tg-model - TextGrid Import Modeller
4 tgadmin - TextGrid repository administration cli tool, based on tgclients
5 tgclients - TextGrid Python clients
The SSH Open Marketplace is maintained and will be further developed by three European Research Infrastructures - DARIAH, CLARIN and CESSDA - and their national partners. It was developed as part of the "Social Sciences and Humanities Open Cloud" SSHOC project, European Union's Horizon 2020 project call H2020-INFRAEOSC-04-2018, grant agreement #823782.