Scripts for Optical Character Recognition in batches
This repository contains various scripts and tools for preparing (bursting, converting, renaming) and OCR:ing pdf:s using Tesseract-OCR. We also have an OCR program based on Pytesseract - a wrapper for Tesseract. It includes language models to enhance the OCR performance.
The SSH Open Marketplace is maintained and will be further developed by three European Research Infrastructures - DARIAH, CLARIN and CESSDA - and their national partners. It was developed as part of the "Social Sciences and Humanities Open Cloud" SSHOC project, European Union's Horizon 2020 project call H2020-INFRAEOSC-04-2018, grant agreement #823782.