Overview
FrançaisRead this article from a comprehensive knowledge base, updated and supplemented with articles reviewed by scientific committees.
Read the articleAUTHORS
-
Djamel Abdelkader ZIGHED: Professor at Lumière University (Lyon-II) Knowledge Engineering Research Team Laboratory (ERIC)
-
Ricco RAKOTOMALALA: Associate Professor, Lumière University (Lyon-II) ERIC Laboratory
INTRODUCTION
Data mining, in its current form and understanding as both a scientific and industrial field, emerged in the early 1990s. This emergence was no accident, but the result of a combination of technological, economic and even socio-political factors.
Data mining can be seen as a necessity imposed by the need for companies to add value to the data they accumulate in their databases. Indeed, the development of storage capacities and network transmission speeds has led users to accumulate more and more data. Some experts estimate that the volume of data doubles every year. What should we do with this data, which is costly to collect and store?
-
The contours
There is still some confusion between data mining and knowledge discovery in data bases (KDD). Data mining is one of the links in the processing chain for discovering knowledge from data. To put it another way, we could say that DCE is a vehicle with data mining as its engine.
Data mining is the art of extracting knowledge from data. Data can be stored in data warehouses, distributed databases or on the Internet ("web mining"). Data mining is not limited to the processing of structured data in the form of numerical tables; it also offers ways of approaching corpora in natural language ("text mining"), images ("image mining"), sound ("sound mining") or video ("multimedia mining").
EDC, through data mining, is then seen as engineering to extract knowledge from data.
-
The approach
EDC is a complex process that follows a sequence of operations. Pre-processing stages take place before the actual data mining. Pre-processing involves accessing data with a view to building "datamarts", specific bodies of data. Pre-processing involves the formatting of input data according to type (numeric, symbolic, image, text, sound), as well as data cleansing, missing data processing, attribute selection or instance selection. This first phase is crucial, because the choice of descriptors and precise knowledge of the population will determine the development of prediction models. The information needed to build a good prediction model may be available in the data, but an inappropriate choice of variables or learning samples may cause the operation to fail.
-
The tools
Data mining, in its restricted definition, operates on two-dimensional tables, called "datamarts", and calls on three major families of methods derived from statistics, data analysis, pattern recognition or machine learning. These methods are commonly used or presented as part of the data miner's arsenal:
...
Exclusive to subscribers. 97% yet to be discovered!
You do not have access to this resource.
Click here to request your free trial access!
Already subscribed? Log in!
The Ultimate Scientific and Technical Reference
This article is included in
Software technologies and System architectures
This offer includes:
Knowledge Base
Updated and enriched with articles validated by our scientific committees
Services
A set of exclusive tools to complement the resources
Practical Path
Operational and didactic, to guarantee the acquisition of transversal skills
Doc & Quiz
Interactive articles with quizzes, for constructive reading
Knowledge extraction from data (KDE)
Bibliography
- - Dans cette bibliographie, nous avons essentiellement inséré les ouvrages de base. Les articles de revues ou des conférences ont été explicitement écartés. On peut trouver sur Internet des bibliographies assez larges sur les différents sujets.
References
Exclusive to subscribers. 97% yet to be discovered!
You do not have access to this resource.
Click here to request your free trial access!
Already subscribed? Log in!
The Ultimate Scientific and Technical Reference