Skip to main content
Workflow

Transkribus Text Tagging

The workflow describes the strategic development and deployment of a text tagging scheme for transcribed historical prints and / or handwritten resources. It aims primarily at larger collections that consist of relatively homogenous resource types. That means they are similar compared to each other regarding their historic context, structure, extent, language, etc.

If you’ve done transcriptions using Transkribus, it may be the ideal platform to implement your tagging scheme, but the approach is transferable.

Key objectives of the workflow are:

  • manage quality and consistency for basic text analysis in the light of limited resources of academic research projects
  • make large collections more manageable
  • iteratively build up a comprehensible and consistent dataset
  • adapt to the specific and generic characteristics of the source material on one side and
  • adapt early to key research questions and analytical dimensions on the other side
  • provide the foundation for further data enrichment and higher levels of analysis such as normalized collection dictionaries, NER, Geo-Referencing, etc.
  • create a model tagging scheme adapted to your needs for further scalability
  • create ground truth data set for domain-specific AI text tagging

Please mind:

  1. The workflow was developed in the context of a specific research project at the University of Basel, please visit "VOCAL POWER" for more information.
  2. For the Open Marketplace it was shaped to address challenges that will very likely arise in similar projects. However: We strongly suggest thoroughly discussing all the steps with key stakeholders and project staff in order to adapt the parameters according to your own needs on the ground.
  3. Work on data projects is prone to get out of hand easily, so plan your steps carefully, test and iterate small portions early on and finish one step at a time.

Media

Related items(2)

Workflow steps(5)

  1. 1 Know Your Data

  2. 2 Know Your Resources

  3. 3 Know Your Progress

  4. 4 Develop Tagging Scheme

  5. 5 Deploy Tagging Scheme

European Union flag

The SSH Open Marketplace is maintained and will be further developed by three European Research Infrastructures - DARIAH, CLARIN and CESSDA - and their national partners. It was developed as part of the "Social Sciences and Humanities Open Cloud" SSHOC project, European Union's Horizon 2020 project call H2020-INFRAEOSC-04-2018, grant agreement #823782.

CESSDACLARINDARIAH-EU