tg-model - TextGrid Import Modeller
Whats the aim? This project focuses on attemps for a simple import of text corpora (encoded in XML/TEI) to TextGrid Repository by modeling the required metadata file structure.
To generate the TextGrid metadata files, the tool tg-model was developed, which can be used as a Python library and on the command line. The metadata files are important for good searchability and presentation of the imported data. The more detailed and precise the metadata in these files is, the better the data can be found in the portal and search results can be faceted. Typically, TEI files already contain metadata in the teiHeader element. These can be used for the content of the metadata files. However, since TEI allows many ways to mark the same information, the automated mapping of TEI metadata to the corresponding TextGrid metadata presents a challenge. The tg-model tool supports this task in several ways: In the first step, configuration files are generated that contain settings for the TextGrid metadata and indicate what values they have or where they can be found in the teiHeader (specified as an XPath expression). As far as the same information is found in the same place in the teiHeader in several files in a corpus, the corresponding XPath expression is written into the configuration files. The configuration files can then be adjusted. Subsequently, based on the configuration files and the input data, the metadata files are generated. Now everything necessary is available to import data into TextGridRep.
The SSH Open Marketplace is maintained and will be further developed by three European Research Infrastructures - DARIAH, CLARIN and CESSDA - and their national partners. It was developed as part of the "Social Sciences and Humanities Open Cloud" SSHOC project, European Union's Horizon 2020 project call H2020-INFRAEOSC-04-2018, grant agreement #823782.