Skip to main content

Creation of a TEI-based corpus

This scenario explains the steps to take, in order to create a corpus based on the TEI tagset. As of today, the TEI guidelines have become a de facto standard for text annotation, providing solutions for a great variety of text and phrase structures, information on content types, linguistic information on words or phrases, etc. In many digital text collections and digital edition projects annotation has been based on the TEI. Linguistic corpora based on TEI may thus be re-used in projects of other disciplines as well or may themselves benefit from the wide range of already existing resources.


Related items(1)

Workflow steps(9)

  1. 1 Corpus Composition

  2. 2 Verification and Cleanup

  3. 3 Conversion to TEI

  4. 4 Verification and Cleanup

  5. 5 Create Workbench

  6. 6 Verification and Cleanup

  7. 7 Linguistic Annotation

  8. 8 Verification and Cleanup

  9. 9 Finalize

European Union flag

The SSH Open Marketplace is maintained and will be further developed by three European Research Infrastructures - DARIAH, CLARIN and CESSDA - and their national partners. It was developed as part of the "Social Sciences and Humanities Open Cloud" SSHOC project, European Union's Horizon 2020 project call H2020-INFRAEOSC-04-2018, grant agreement #823782.