Skip to main content
Workflow

Automatic Text Recognition Roadmap

Work-in-progress: this workflow is not finalised yet.

Automatic Text Recognition (ATR) uses Artificial Intelligence (AI), in particular machine learning (ML), to extract text from a scanned image. It encompasses two main techniques: Optical Character Recognition (OCR), extracting text from printed documents, and Handwritten Text Recognition (HTR), exracting text from manusripts.
This workflow presents the main steps of an ATR workflow and how to integrate it in your research project.

Media

Workflow steps(9)

  1. 1 Needs and Objective(s)

  2. 2 Resources

  3. 3 Integrating ATR in your workflow

  4. 4 Image Acquisition

  5. 5 Image Pre-Processing

  6. 6 Layout Analysis

  7. 7 Text Recognition and Model Training

  8. 8 Quality Assurance and Metrics

  9. 9 Endformat and Reusability

European Union flag

The SSH Open Marketplace is maintained and will be further developed by three European Research Infrastructures - DARIAH, CLARIN and CESSDA - and their national partners. It was developed as part of the "Social Sciences and Humanities Open Cloud" SSHOC project, European Union's Horizon 2020 project call H2020-INFRAEOSC-04-2018, grant agreement #823782.

CESSDACLARINDARIAH-EU