Building Textual Corpora for Under-resourced Languages
How to build textual corpora for research into under-resourced languages. This workflow was designed during the workshop "Creating Managing and Archiving Textual Corpora in Under-resourced Languages”. The workshop was conceived by DARIAH Working Groups Research Data Management and Multilingual DH, financed by DARIAH-EU Funding Scheme for Working Group Activities 2023-25, and hosted by the University of Hamburg on 28th to 30th August 2024.
Workflow steps(9)
1 Data discovery 1
2 Data discovery 2
3 Ethical and legal considerations
4 Metadata documentation
5 Digitization
6 Character encoding
7 Data cleaning
8 Create archival package
9 Curate your corpus
The SSH Open Marketplace is maintained and will be further developed by three European Research Infrastructures - DARIAH, CLARIN and CESSDA - and their national partners. It was developed as part of the "Social Sciences and Humanities Open Cloud" SSHOC project, European Union's Horizon 2020 project call H2020-INFRAEOSC-04-2018, grant agreement #823782.