Skip to main content

Extract textual content from images

No description provided.


Workflow steps(13)

  1. 1 Define the characteristics of the outcome

  2. 2 Define the characteristics of the image

  3. 3 Survey existing experiences

  4. 4 Choose engine based on the type of the content

  5. 5 Layout analysis

  6. 6 Create manual transcriptions

  7. 7 Training the model

  8. 8 Test on a subset and assess quality

  9. 9 Correct output

  10. 10 Re-train the model with corrected output

  11. 11 Produce OCR output in standardized format

  12. 12 Extract the structure information from recognized blocks

  13. 13 A few visualization options

European Union flag

The SSH Open Marketplace is maintained and will be further developed by three European Research Infrastructures - DARIAH, CLARIN and CESSDA - and their national partners. It was developed as part of the "Social Sciences and Humanities Open Cloud" SSHOC project, European Union's Horizon 2020 project call H2020-INFRAEOSC-04-2018, grant agreement #823782.