Reiter
In order to be able to analyse objects using digital methods, they must be available in digital form. If it is not an object of investigation that was created digitally - born-digital - it must first be digitised. Various standards and best practices have been established for this digitisation. In this unit, we will get to know two tools for text digitisation: OCR4All and Transkribus.
Transkribus
Transkribus is a platform for machine text recognition (OCR) and handwriting recognition developed by the University of Innsbruck and the EU research project READ. This software enables the automatic transcription of handwritten texts. Transkribus is based on OCR and HTR (Handwriting Text Recognition) technologies. The platform uses neural networks and machine learning to recognise and transcribe handwritten texts. Users can train and customise models to improve the accuracy of recognition.
Who? ... is currently responsible for the development of the tool | READ CORP&University of Innsbruck |
How? ... do I cite the tool? | Kahle, P., Colutto, S., Hackl, G., Mühlberger, G., 2017. Transkribus - A Service Platform for Transcription, Recognition and Retrieval of Historical Documents, in: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR). pp. 19–24. https://doi.org/10.1109/ICDAR.2017.307 |
Where? ...will my data be processed? | On the READ-CORP servers in Innsbrück |
What do I use it for? | To convert handwritten tests into machine-readable texts. |
Web Client
You do not have to install Transkribus. Instead go to https://www.transkribus.org click on Open App and register for an account. You may use the Getting Started Guide to make yourself familiar with the tool.
OCR4all:
OCR4all is an open source software developed for optical character recognition (OCR) and handwriting recognition of historical texts. This software is particularly tailored to the needs of researchers and cultural institutions. OCR4all uses a combination of machine learning and traditional OCR technology to recognise printed texts and handwritten notes. The software is modular and allows users to create and customise workflows for specific OCR requirements.
Who? ... is currently responsible for the development of the tool | University of Würzburg |
How? ... do I cite the tool? | Wehner, M., Dahnke, M., Landes, F., Nasarek, R., & Reul, C. (2020, Februar 20). OCR4all – Eine semi-automatische Open-Source-Software für die OCR historischer Drucke. DHd 2020 Spielräume: Digital Humanities zwischen Modellierung und Interpretation. 7. Tagung des Verbands "Digital Humanities im deutschsprachigen Raum" (DHd 2020), Paderborn. https://doi.org/10.5281/zenodo.4621738 |
Where? ...will my data be processed? | Depending on how the tool is set up, either on your own machine or on any server |
What do I use it for? | To convert printed texts into machine-readable texts. |
Installation
The Installation of OCR4all does require knowledge about the terminal and files system of your maschine. Therefore in this course you will only need to use transkribus. If you are interested you are welcome to try and install OCR4all.
You find a detailed installation guide unter https://www.ocr4all.org/guide/setup-guide/quickstart
You find a detailed installation guide unter https://www.ocr4all.org/guide/setup-guide/quickstart
Task
The Manuscript of the Dancing Men is a handwritten version of the short story ‘The Adventure of the Dancing Men’, which was first published on 5 December 1903. In the following task you will digitise the manuscript with Transkribus. Below this task you will find a link-intensive graphic on which you can click on the parts of the Transkribus interface to obtain explanations of the programme functions. Manuscript_img
| ![]() |