Skip to main content

UDPipe: tool for lemmatization, morphological analysis, POS tagging and dependency parsing in multiple languages

UDPipe is a software tool and service that analyzes (plain) text in about 100 different natural languages up to the dependency syntax level. Users specify the desired function (tokenization, segmentation, morphological analysis, lemmatization, POS tagging, dependency parsing), output format, and input text (file(s)). The resulting analysis can be used to index and search documents by lemmas instead of multiple word forms, extract syntactic dependencies with POS information to get relations between words or lemmas, or get grammatical information for every word in the text. While in many cases the results can be used directly (statistical analysis on part of speech, lemmas and words), in many other applications the results of UDPipe serve as intermediate input to more sophisticated analysis, such as infomration extraction, knowledge representation, term extraction etc. UDPipe software allow to train on any language for which a treebank is available in the CoNLL-U format, such as all the Universal Dependencies corpora.


European Union flag

The SSH Open Marketplace is maintained and will be further developed by three European Research Infrastructures - DARIAH, CLARIN and CESSDA - and their national partners. It was developed as part of the "Social Sciences and Humanities Open Cloud" SSHOC project, European Union's Horizon 2020 project call H2020-INFRAEOSC-04-2018, grant agreement #823782.