4. Resources for automatic text processing
NLP systems rely on several types of resources:
textual resources: textual data, known as corpora, are used to create test benches, to train learning systems, to extract lexical data, etc. ...;
lexical resources: lexicons form the core of the linguistic information used by a system. They vary in nature depending on the application, and incorporate more or less complex information, from simple word lists to structured semantic resources. Given the cost involved in building a lexicon for a given application, the trend is towards reusability and automatic acquisition of lexical data;
software resources: lemmatizers, segmenters and labelers are the basic building blocks of text processing. The complexity of a NLP application calls for the reusability of existing components; this...
Exclusive to subscribers. 97% yet to be discovered!
You do not have access to this resource.
Click here to request your free trial access!
Already subscribed? Log in!
The Ultimate Scientific and Technical Reference
This article is included in
Software technologies and System architectures
This offer includes:
Knowledge Base
Updated and enriched with articles validated by our scientific committees
Services
A set of exclusive tools to complement the resources
Practical Path
Operational and didactic, to guarantee the acquisition of transversal skills
Doc & Quiz
Interactive articles with quizzes, for constructive reading
Resources for automatic text processing
Bibliography
Software tools
References for tools and resources cited in the article :
TreeTagger: morpho-syntactic labeling and lemmatization. University of Stuttgart http://www.ims.uni-stuttgart.de/projekte/corplex/TreeTagger/
Websites
Portals on language technologies and language resources
CNRTL Centre National de Ressources Textuelles et Lexicales http://www.cnrtl.fr/
ELRA – European Language Resources Association http://www.elra.info/
Language Technology World, DFKI...
Standards and norms
TEI (Text Encoding Initiative) consortium created in 1987 to produce recommendations for standardizing the encoding of digital documents http://www.tei-c.org
Events
TALN (Traitement Automatique des Langues Naturelles) international French-language conference organized annually since 1994 by the ATALA Association pour le Traitement Automatique des Langues.
Directory
Associations (non-exhaustive list)
ACL Association for Computational Linguistics http://www.aclweb.org/
ATALA Association for Automatic Language Processing http://www.atala.org/
APIL Association des Professionnels des Industries de la...
Exclusive to subscribers. 97% yet to be discovered!
You do not have access to this resource.
Click here to request your free trial access!
Already subscribed? Log in!
The Ultimate Scientific and Technical Reference