
This article presents the automatic text processing techniques which are currently used in order to manage the information they contain in a more pertinent and efficient way. After introducing the current needs in professional activities for refined and varied access modes to document content, it then proceeds to present the applications, methods and linguistic resources which are used in order to perform efficiently such textual information analysis processes.
Read this article from a comprehensive knowledge base, updated and supplemented with articles reviewed by scientific committees.
Read the articleAUTHOR
Cécile FABRE: Professor of Language Sciences Université Toulouse 2 - Le Mirail and CLLE-ERSS laboratory (UMR 5263)
Documents available in electronic form are a major source of information, and are driving the development of applications designed to facilitate their management and use. This textual data is very diverse in nature:
documentation produced by the company, its partners and customers (technical reports, maintenance documentation, contracts, meeting minutes, e-mail messages, etc.);
information of a technological and economic nature that companies need to collect and exploit from a wide and diversified documentary environment (patents, research reports, grey literature, commercial and technical news available on the web, etc.).
It is through these documents that the bulk of information flows, and it is therefore crucial for organizations to have techniques for accessing the business knowledge contained in this data. In fact, most strategic information is textual in nature. Understanding and analyzing it is essential for :
scientific and technological monitoring, knowledge management and transfer;
support decision-making, risk identification, etc.
And yet, this data is both voluminous and unstructured. It is highly heterogeneous in nature. It is rarely written to explicit standards, and may be written under time pressure (reports, notes, minutes, letters). These characteristics make the material very difficult to process: relevant information has to be extracted from the textual flow; this extraction is complex due to the ambiguity and variability that characterize language expression. The exploitation of these "off-the-shelf" texts has therefore become a major technological challenge. New technical solutions, often described as "semantic" and "intelligent", are being proposed to companies to :
mastering the profusion of electronic documents – procedures for classifying, selecting, synthesizing and structuring documents;
extract and organize the information they contain.
These solutions are based on automatic language processing (ALP) techniques. The aim of this dossier is to provide an overview of the automated language processing techniques used and, by facilitating understanding of these techniques, to enable a reasoned choice to be made among the solutions proposed in the field of information processing.
Exclusive to subscribers. 97% yet to be discovered!
You do not have access to this resource.
Click here to request your free trial access!
Already subscribed? Log in!

The Ultimate Scientific and Technical Reference
| | | | |
This article is included in
Software technologies and System architectures
This offer includes:
Knowledge Base
Updated and enriched with articles validated by our scientific committees
A set of exclusive tools to complement the resources
Practical Path
Operational and didactic, to guarantee the acquisition of transversal skills
Doc & Quiz
Interactive articles with quizzes, for constructive reading
Automatic text processing
Software tools
References for tools and resources cited in the article :
TreeTagger: morpho-syntactic labeling and lemmatization. University of Stuttgart http://www.ims.uni-stuttgart.de/projekte/corplex/TreeTagger/
Portals on language technologies and language resources
CNRTL Centre National de Ressources Textuelles et Lexicales http://www.cnrtl.fr/
ELRA – European Language Resources Association http://www.elra.info/
Language Technology World, DFKI...
Standards and norms
TEI (Text Encoding Initiative) consortium created in 1987 to produce recommendations for standardizing the encoding of digital documents http://www.tei-c.org
TALN (Traitement Automatique des Langues Naturelles) international French-language conference organized annually since 1994 by the ATALA Association pour le Traitement Automatique des Langues.
Associations (non-exhaustive list)
ACL Association for Computational Linguistics http://www.aclweb.org/
ATALA Association for Automatic Language Processing http://www.atala.org/
APIL Association des Professionnels des Industries de la...
Exclusive to subscribers. 97% yet to be discovered!
You do not have access to this resource.
Click here to request your free trial access!
Already subscribed? Log in!

The Ultimate Scientific and Technical Reference