Reiter

In order to be able to analyse objects using digital methods, they must be available in digital form. If it is not an object of investigation that was created digitally - born-digital - it must first be digitised. Various standards and best practices have been established for this digitisation. In this unit, we will get to know two tools for text digitisation: OCR4All and Transkribus. 

Transkribus

Transkribus is a platform for machine text recognition (OCR) and handwriting recognition developed by the University of Innsbruck and the EU research project READ. This software enables the automatic transcription of handwritten texts. Transkribus is based on OCR and HTR (Handwriting Text Recognition) technologies. The platform uses neural networks and machine learning to recognise and transcribe handwritten texts. Users can train and customise models to improve the accuracy of recognition.
Toolcard -Transcibus
Who? ... is currently responsible for the development of the tool
READ CORP&University of Innsbruck
How? ... do I cite the tool?
Kahle, P., Colutto, S., Hackl, G., Mühlberger, G., 2017. Transkribus - A Service Platform for Transcription, Recognition and Retrieval of Historical Documents, in: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR). pp. 19–24. https://doi.org/10.1109/ICDAR.2017.307
Where? ...will my data be processed?
On the READ-CORP servers in Innsbrück
What do I use it for?
To convert handwritten tests into machine-readable texts.

Web Client

You do not have to install Transkribus. Instead go to https://www.transkribus.org click on Open App and register for an account. You may use the Getting Started Guide to make yourself familiar with the tool.

OCR4all:

OCR4all is an open source software developed for optical character recognition (OCR) and handwriting recognition of historical texts. This software is particularly tailored to the needs of researchers and cultural institutions. OCR4all uses a combination of machine learning and traditional OCR technology to recognise printed texts and handwritten notes. The software is modular and allows users to create and customise workflows for specific OCR requirements.
Toolcard -OCR4all
Who? ... is currently responsible for the development of the tool
University of Würzburg
How? ... do I cite the tool?
Wehner, M., Dahnke, M., Landes, F., Nasarek, R., & Reul, C. (2020, Februar 20). OCR4all – Eine semi-automatische Open-Source-Software für die OCR historischer Drucke. DHd 2020 Spielräume: Digital Humanities zwischen Modellierung und Interpretation. 7. Tagung des Verbands "Digital Humanities im deutschsprachigen Raum" (DHd 2020), Paderborn. https://doi.org/10.5281/zenodo.4621738 
Where? ...will my data be processed?
Depending on how the tool is set up, either on your own machine or on any server 
What do I use it for?
To convert printed texts into machine-readable texts.

Installation

The Installation of OCR4all does require knowledge about the terminal and files system of your maschine. Therefore in this course you will only need to use transkribus. If you are interested you are welcome to try and install OCR4all. 
You find a detailed installation guide unter https://www.ocr4all.org/guide/setup-guide/quickstart

Task 

The Manuscript of the Dancing Men is a handwritten version of the short story ‘The Adventure of the Dancing Men’, which was first published on 5 December 1903. In the following task you will digitise the manuscript with Transkribus. Below this task you will find a link-intensive graphic on which you can click on the parts of the Transkribus interface to obtain explanations of the programme functions.
  1. Start Transkribus and log in
    You will find the login function in the menu bar at the top of the screen
  2. Import the manuscript
    Load the image: Manuscript-the-dancing-men-p06.jpg on your PC.
  3. Create a new collection in Transkribus.
    Import the image into a new collection
  4. Apply a pre-trained layout model to the manuscript.
    Select the [T] icon to access the model selection
  5. First select ‘Layout’ in the top tab.
    Then select the ‘Universal Lines’ model.
    Now click on ‘Start recognition’ in the top right-hand corner of the screen. Transkribus will now start analysing the layout. When the calculation is complete, you will see the recognised lines in your digital copy.
  6. Apply a pre-trained transcription model.
    Select the [T] icon again. Stay in the transcription model selection this time. Find a model that you can apply to English texts.
    Start the recognition.Find an error in the transcription and describe it.