Skip to main content
Tool or service

Scripts for Optical Character Recognition in batches

This repository contains various scripts and tools for preparing (bursting, converting, renaming) and OCR:ing pdf:s using Tesseract-OCR. We also have an OCR program based on Pytesseract - a wrapper for Tesseract. It includes language models to enhance the OCR performance.

European Union flag

The SSH Open Marketplace is maintained and will be further developed by three European Research Infrastructures - DARIAH, CLARIN and CESSDA - and their national partners. It was developed as part of the "Social Sciences and Humanities Open Cloud" SSHOC project, European Union's Horizon 2020 project call H2020-INFRAEOSC-04-2018, grant agreement #823782.

CESSDACLARINDARIAH-EU