Skip to main content
Workflow

Multilingual analysis and visualization of bibliographic metadata and texts with AVOBMAT

The multilingual AVOBMAT (Analysis and Visualization of Bibliographic Metadata and Texts) digital tool enables researchers to critically analyse bibliographic data and texts at scale with the help of data-driven methods and tools supported by natural language processing techniques. The unique features of the AVOBMAT toolkit are that (i) it can preprocess, analyse and (semantically) enrich a huge number of texts and metadata in a number of languages; (ii) the implemented analytical and visualization tools provide interactive close and distant reading of texts and bibliographic data; (iii) it combines bibliographic data and natural language processing research methods in one integrated, interactive and user-friendly web application, which enables users to ask complex research questions; (iv) it fosters critical analysis, for example, by identifying data gaps and missing metadata values. In the cleaning and preprocessing phase, the user can set nine optional parameters such as lemmatization and stopword filtering and test the validity of the configuration on a small number of tests before uploading large databases, which is made possible by the vertically and horizontally scalable architecture. You can create different configurations for the different analyses and visualizations and save these settings in templates for future use. The metadata enrichment includes the automatic identification of the gender of the authors and automatic language detection. You can search and filter the uploaded and enriched bibliographic data and preprocessed texts in faceted, advanced and command line modes. Having filtered the uploaded databases and selected the metadata field(s), you can (i) analyze and visualize the bibliographic data chronologically in line and area charts in normalized and aggregated formats; (ii) create an interactive network analysis; (iii) make pie, horizontal and vertical bar charts of the bibliographic data. As for the content analysis, the diachronic analysis of texts is supported by the N-gram viewer. Two types of frequency analyses are implemented: the significant text function shows what differentiates a subset of documents from other texts in the corpus, and the TagSpheres enables users to investigate the context of a word. The close reading is also fostered by the Keyword in Context tool. AVOBMAT has an Latent Dirichlet Allocation function to calculate and visualize topic models. It semantically enriches the texts and metadata by the use of named entity recognition and Parts-of-Speech Tagging currently in 16 languages, and creates statistics of these results. It also has a built-in disambiguation and named entity linking (e.g. Wikidata, VIAF) function in English. AVOBMAT helps users realize the epistemological challenges, limitations and strengths of computational text analysis and visual representation of digital texts and datasets.

Related items(1)

Workflow steps(9)

  1. 1 Upload your corpus

  2. 2 Clean your corpus

  3. 3 Configure the preprocess parameters for each analysis

  4. 4 Validate your configuarion settings

  5. 5 Select, search and refine your corpus

  6. 6 Interactive metadata analysis

  7. 7 Interactive content analysis 1

  8. 8 Interactive content analysis 2

  9. 9 Export results, configurations and publicize datababases

European Union flag

The SSH Open Marketplace is maintained and will be further developed by three European Research Infrastructures - DARIAH, CLARIN and CESSDA - and their national partners. It was developed as part of the "Social Sciences and Humanities Open Cloud" SSHOC project, European Union's Horizon 2020 project call H2020-INFRAEOSC-04-2018, grant agreement #823782.

CESSDACLARINDARIAH-EU