Overview
FrançaisRead this article from a comprehensive knowledge base, updated and supplemented with articles reviewed by scientific committees.
Read the articleAUTHOR
-
François ROLE: Library curator - Research Fellow, University of Paris 8
INTRODUCTION
Since ancient times, it has been common practice to mark and annotate texts in order to facilitate their study or criticism (think of medieval annotation systems, or the apparatus of symbols devised as early as the III e century BC by Alexandrian philologists).
In the digital world, electronic mark-up (defined here as the insertion into an electronic file of markings that are linked to, but not directly part of, the text) has long been used almost exclusively to drive printing or display devices (photocopiers, printers, screens). It is this markup that is implicitly (*) used by most researchers in the humanities and social sciences through commercial DTP tools.
(*) "implicitly" in the sense that manipulations carried out via the keyboard or pointing devices somehow generate the physical mark-up information on which the DTP software relies to carry out the operations it is asked to perform.
Despite its merits, this markup is, as we said, oriented towards text production or display, and is therefore not designed to facilitate intellectual exploration of documents. Gradually, therefore, the idea emerged that we needed to resort to a markup level less dependent on production constraints, and conducive to higher-level processing of texts, because it describes their logical structure.
SGML (Standard Generalized Markup Language) is currently the most widely used standard for logically tagging texts. It allows any user to define a logical markup language adapted to their needs, by writing a DTD (Document Type Definition).
The Text Encoding Initiative (TEI) is an SGML DTD accompanied by a volume of "recommendations"; the TEI "Guidelines" explain how the DTD should be used. This DTD is tailored primarily to the needs of the humanities and social sciences research community (or more generally to any researcher wishing to explore vast textual corpora in electronic form). It enables linguists to syntactically tag corpora, historians to mark dates, place names or characters in a text, literary researchers to study the stylistics or genesis of a text, and so on.
After some historical background and an informal presentation of the structure of a TEI text, we describe the mechanisms implemented in writing the TEI DTD (modularity, inheritance, extensibility).
This part is more technical than the others, and requires a good knowledge of SGML.
At the end of this article, we present a few examples of TEI tagging.
SGML concepts and techniques are described in the "SGML" article .
Exclusive to subscribers. 97% yet to be discovered!
You do not have access to this resource.
Click here to request your free trial access!
Already subscribed? Log in!
The Ultimate Scientific and Technical Reference
This article is included in
Digital documents and content management
This offer includes:
Knowledge Base
Updated and enriched with articles validated by our scientific committees
Services
A set of exclusive tools to complement the resources
Practical Path
Operational and didactic, to guarantee the acquisition of transversal skills
Doc & Quiz
Interactive articles with quizzes, for constructive reading
TEI (Text Encoding Initiative)