Overview
FrançaisRead this article from a comprehensive knowledge base, updated and supplemented with articles reviewed by scientific committees.
Read the articleAUTHORS
-
Claude CHRISMENT: Doctor of Science - Professor of Computer Science at Toulouse III Paul-Sabatier University
-
Jacques LE MAITRE: Qualified to direct research - Professor of Computer Science at the University of Toulon and Var
-
Florence SÈDES: Qualified to direct research - Senior lecturer in computer science at Toulouse II University
INTRODUCTION
Document applications are based on the storage function, which must be integrated with other functionalities enabling exploration, partial re-use of stored document content, and sometimes even restructuring. For example, all IT applications linked to the testing, integration and maintenance of structured objects – assembly of components – can be cited, whether in the context of software engineering (software components), space (satellite integration: satellite components), aerospace (aircraft components), and so on. Typically, components are described in specification manuals, which have to be reused and adapted as part of integration, testing and maintenance activities. The problems associated with the multiplicity of heterogeneous data sources have become even more acute with the rise of the Web. Integration tools and models are needed to provide an abstract and synthetic vision, and to make these large volumes of data accessible and easy to manipulate.
The implementation of such electronic document management systems generally requires the use of database management systems to perform the interdependent functions of storing and accessing information. Electronic documents are generally accessed and searched in three different ways. The first, essentially used for textual data, consists in searching for a string – more generally a pattern – in a text: this is found in information retrieval systems that implement "full-text" indexing and textual matching mechanisms. The second relies on a priori knowledge of a total structure defined on the data being manipulated: this is found in database management systems, where it is implemented through the database schema and a query language based on a finite set of operators. The third implements scanning and navigation mechanisms on weakly structured information. This is found in hypertext systems and, in particular, on the Web. All three approaches must be supported by any electronic document management system.
The concept of document is associated with that of semi-structured information, which is characterized by its total or partial absence of structure, from completely unstructured information to semi-structured information, as well as its heterogeneity: multiplicity of formats, formalisms, structures, types, media, etc. Documents are stored in a warehouse, or document base, which supports interrogation and manipulation, via indexing, filtering and retrieval operators. Documents are stored in a warehouse, or document base, which can be queried and manipulated using indexing, filtering and retrieval operators. The modeling of any document base must be generic, scalable, independent of the level of granularity of document units and representation standards.
The first part of this article presents...
Exclusive to subscribers. 97% yet to be discovered!
You do not have access to this resource.
Click here to request your free trial access!
Already subscribed? Log in!
The Ultimate Scientific and Technical Reference
This article is included in
Digital documents and content management
This offer includes:
Knowledge Base
Updated and enriched with articles validated by our scientific committees
Services
A set of exclusive tools to complement the resources
Practical Path
Operational and didactic, to guarantee the acquisition of transversal skills
Doc & Quiz
Interactive articles with quizzes, for constructive reading
Documentary databases
Appendix: Sample queries
This illustration is based on the formulation of queries on an example document in SgmlQL.
-
Example document (figure 1 and corresponding locator)
Note :Note :
Characters...
References
In Techniques de l'Ingénieur, Computer Science section
Some reference sites for query languages
SgmlQL, an extension of the OQL language for querying SGML documents :
http://www.univ-tln.fr/~gect/simm/SgmlQL
LOREL, OQL extension with construction of generalized access paths in a graph :
http://www-db.stanford.edu/lore
XML-QL, proposal submitted to the W3 consortium: query language for XML documents,...
Exclusive to subscribers. 97% yet to be discovered!
You do not have access to this resource.
Click here to request your free trial access!
Already subscribed? Log in!
The Ultimate Scientific and Technical Reference