Reiter
Metadata: How data is described by man and maschine
Exercise: The Text-Encoding Initiative
The TEI consortium has been developing standards for machine-readable text markup since 1994, which are available in the form of guidelines at https://tei-c.org. Why are such standards necessary?
Look carefully at the above scanned page from Strand Magazine. Try to reproduce the text up to the end of the first paragraph using a post in the forum.
What problems do you encounter?
Look carefully at the above scanned page from Strand Magazine. Try to reproduce the text up to the end of the first paragraph using a post in the forum.
What problems do you encounter?

How is text expressed digitally?
A computer works on the basis of bits, i.e. two states that are expressed as 0 and 1. Text coding is necessary so that a computer can interpret these bits as text. In many cases, this is UTF-8.
The character string Sherlock interpreted with UTF-8 would look like this when written as bits:
01010011 01101000 01100101 01110010 01101100 01101111 01100011 01101011
However, as we found out in the last task, there is a lot of information that cannot be written down directly with digital text.
This includes information such as:
- "THE STRAND MAGAZINE." is a headline and the name of a magazine.
- "III. - The Adventure of the Dancing Men." is written in a different font and centered.
- The H in Holmes is much larger than the rest of the word and ornamented. (An initial)
In order to mark up this information, text markup standards such as TEI are used. For example in digital editions.
The character string Sherlock interpreted with UTF-8 would look like this when written as bits:
01010011 01101000 01100101 01110010 01101100 01101111 01100011 01101011
However, as we found out in the last task, there is a lot of information that cannot be written down directly with digital text.
This includes information such as:
- "THE STRAND MAGAZINE." is a headline and the name of a magazine.
- "III. - The Adventure of the Dancing Men." is written in a different font and centered.
- The H in Holmes is much larger than the rest of the word and ornamented. (An initial)
In order to mark up this information, text markup standards such as TEI are used. For example in digital editions.
Basic features of TEI
As the TEI guidelines are very comprehensive, we will not be able to cover all aspects of the TEI standard in one learning unit. Therefore, after this unit you should be able to answer three basic questions about TEI.
- What is a well-formed XML?
- What is the difference between header and body?
- How do you learn TEI?
Why XML?
What does XML have to do with TEI? XML is the data format on which TEI is based. All TEI files have the XML data format but not all XML files have the TEI markup standard. XML stands for Extensible Markup Language, a file format that works with so-called elements that are arranged hierarchically.
What does XML have to do with TEI? XML is the data format on which TEI is based. All TEI files have the XML data format but not all XML files have the TEI markup standard. XML stands for Extensible Markup Language, a file format that works with so-called elements that are arranged hierarchically.
XML Structure

XML Syntax

A well-formed XML is a document with an XML declaration and exactly one root element. Each element consists of a start and end tag that are not overlapping. None of the elements has an attribute with the same name, no & or < characters are used in the character data and all comments are outside of tags.
Task
In this Link: https://tei-c.org/release/doc/tei-p5-doc/en/html/examples-docTitle.html you will find under "Grouped texts" a TEI/XML markup provided py the TEI- Consortium. Find in this markup the following:
- A Start-Tag
- An End-Tag
- A Parent Element
- A Child Element
- The Root-Element
Post and explain your examples on the forum. If you can't find something on the list, argue why it's not in the markup.