Reiter

Describe a Digital Edition

Please visit the Catalogue of Digital Editions of the University of Wuppertal and find one Digital Edition from your field of interest. Open the specific edition and try and answer the following questions.
1. In which way does the edition reproduce the tradition of the text or object it is about?
2. Which features of the text or object the edition is about are represented in the Digital Edition and in how?

Maschine Readability

Since we as humans, once we learned how to read, always interpret text directly, it is difficult to understand how a machine perceives a text, i.e. a character string. The following visualization is a text.
To be more precise, it is a famous quote from Sherlock Holmes. Each character - letters, spaces and punctuation marks - has been assigned a specific color. However, you will not be able to read the text. All you see is a pattern and you have no immediate way of recognizing the meaning of the pattern. This is exactly what happens to a machine that receives text as input. If we want a machine to interpret strings of characters meaningfully, we first have to teach it to do so.
How have Language Models been been trained can be understood the following way. Traditionally parts of a digital Text representation have been annotated by humans eg. in this case all spaces and punctuation marks are annotated. In a large body of text you could now train a model to predict when a sentence starts and when a sentence ends based on thoose features. This may work just fine - but keep in mind  the model has not understood what a sentence is, it simply learned to predict the start and end of a sentence based on patterns. 

CATMA

CATMA (Computer Assisted Text Markup and Analysis) is a annotation tool developed specifically for analyzing texts. It enables researchers to annotate and analyze texts by creating and customizing their own annotation schemes. CATMA enables the annotation of texts at different levels, including morphological, syntactic, semantic and pragmatic analysis, and encourages collaboration between researchers. This versatile tool supports the identification of patterns and linguistic features in texts and is a valuable instrument for in-depth text analysis and research in various scientific disciplines. 

Task

  1. Create an account for CATMA at https://catma.de. Run CATMA 7 and create a new project.
  2. Download the Opening of the Adventure of the Dancing Men (see below). Then use the "Plus" button in Documents & Annotations to add a document.
  3. Add the author name to the metadata.
  4. The import may take some time. Wait and keep your browser open. You can then select a naming scheme for the parts of your project - depending on which annotation you want to add to the document.
  5. Now you can create a tagset yourself. First create a tagset for the classification of Named Entities.
  6. You can add tags via "Add Tag". You must always select which tag set your new tag should belong to.
  7. You will now find the tagset and the text in the annotation tab of the project. You may go to the annotation view by clicking on "Annotate" in the navigation on the left.
You add annotations by selecting the text and then clicking on the corresponding tag. If you want to delete annotations, select them with a left click and then click on the trash can icon under "selected annotations".
Try annotating Named Entities. What are Named Entities? Any entity in the text that has an individual value of information, i.e. names of people or organisations, but also numerical values and dates such as "the fifth of November" or "five golden rings". In some cases it is not easy to distinguish an entity from a non-entity.
Once you are finished make a screenshot of your annotation and post it in the forum. Did you encounter any difficulties?