Jelena Andonovski
andonovski@unilib.rs
Nataša Dakić
dakic@unilib.rs
Aleksandra Trtovac
aleksandra@unilib.rs
Univerzitetska biblioteka „Svetozar Marković“, Beograd
doi: 10.19090/cit.2020.37.35-46
No. 37 (November 2020), p. 35-46
Searchable Digitized Manuscript Collections: An Opportunity to Read Serbian Cyrillic
Summary
The READ (Recognition and Enrichment of Archival Documents) project has the potential to revolutionise access
to historical collections held by cultural institutions all over Europe. This project was implemented in the period
2016/2019. It was funded by the European Commission, and involved 13 partners from the European Union. The
overall objective of READ was to build a virtual research environment where archivists, humanities scholars, IT
specialists and volunteers would collaborate with the ultimate goal of boosting research, innovation, development
and usage of cutting edge technology for the automated recognition, transcription, indexing and enrichment of
handwritten archival documents.
Since its launch in 2016, in line with its concept of creating virtual research environment, the READ project was
developing advanced text recognition technology on the basis of artificial neural networks. Research in pattern
recognition, computer vision, document image analysis, language modelling, but also in digital humanities, archival
research and related fields has seen unprecedented progress in recent years, and European research groups are
on the forefront of this specific field. Newly developed technologies and tools are integrated via publicly available
infrastructure – the Transkribus platform.
The primary goal of Transkribus is to support users who transcribe printed or handwritten documents. Only a few
years ago, it was still in the realm of fantasy that computers would become able to read historical scripts and to
automatically recognise and transcribe the handwritten text of documents from the past centuries. On the other
hand, users of Transkribus are able to extract data from handwritten and printed texts via HTR (Handwritten Text
Recognition) technology and search digitized text without retyping, using sophisticated technology known as
KWS (Keyword Spotting), while simultaneously contributing to the improvement of the same technology thanks
to machine learning principles. The automated recognition of a wide variety of historical texts has significant
implications for the accessibility of the written records of global cultural heritage.
Keywords:
libraries, archives, manuscripts, READ project, Transkribus, transcription, neural networks, virtual
research environment, Handwritten Text Recognition (HTR), Keyword Spotting (KWS)
Submitted: 24th August 2020
Correction to the manuscript: 7th October 2020
Accepted for publication: 12th October 2020
Searchable Digitized Manuscript Collections: An Opportunity to Read Serbian Cyrillic
by
Jelena Andonovski, Nataša Dakić, Aleksandra Trtovac
is licensed under a
Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Full text
|