Article | REF: H3744 V1

Knowledge extraction from data (KDE)

Authors: Djamel Abdelkader ZIGHED, Ricco RAKOTOMALALA

Publication date: November 10, 2002

You do not have access to this resource.
Click here to request your free trial access!

Already subscribed? Log in!

Overview

Français

Read this article from a comprehensive knowledge base, updated and supplemented with articles reviewed by scientific committees.

Read the article

AUTHORS

Djamel Abdelkader ZIGHED: Professor at Lumière University (Lyon-II) Knowledge Engineering Research Team Laboratory (ERIC)
Ricco RAKOTOMALALA: Associate Professor, Lumière University (Lyon-II) ERIC Laboratory

INTRODUCTION

Data mining, in its current form and understanding as both a scientific and industrial field, emerged in the early 1990s. This emergence was no accident, but the result of a combination of technological, economic and even socio-political factors.

Data mining can be seen as a necessity imposed by the need for companies to add value to the data they accumulate in their databases. Indeed, the development of storage capacities and network transmission speeds has led users to accumulate more and more data. Some experts estimate that the volume of data doubles every year. What should we do with this data, which is costly to collect and store?

The contours

There is still some confusion between data mining and knowledge discovery in data bases (KDD). Data mining is one of the links in the processing chain for discovering knowledge from data. To put it another way, we could say that DCE is a vehicle with data mining as its engine.

Data mining is the art of extracting knowledge from data. Data can be stored in data warehouses, distributed databases or on the Internet ("web mining"). Data mining is not limited to the processing of structured data in the form of numerical tables; it also offers ways of approaching corpora in natural language ("text mining"), images ("image mining"), sound ("sound mining") or video ("multimedia mining").

EDC, through data mining, is then seen as engineering to extract knowledge from data.

The approach

EDC is a complex process that follows a sequence of operations. Pre-processing stages take place before the actual data mining. Pre-processing involves accessing data with a view to building "datamarts", specific bodies of data. Pre-processing involves the formatting of input data according to type (numeric, symbolic, image, text, sound), as well as data cleansing, missing data processing, attribute selection or instance selection. This first phase is crucial, because the choice of descriptors and precise knowledge of the population will determine the development of prediction models. The information needed to build a good prediction model may be available in the data, but an inappropriate choice of variables or learning samples may cause the operation to fail.

The tools

Data mining, in its restricted definition, operates on two-dimensional tables, called "datamarts", and calls on three major families of methods derived from statistics, data analysis, pattern recognition or machine learning. These methods are commonly used or presented as part of the data miner's arsenal:
...

You do not have access to this resource.

Exclusive to subscribers. 97% yet to be discovered!

You do not have access to this resource.
Click here to request your free trial access!

Already subscribed? Log in!

The Ultimate Scientific and Technical Reference

A Comprehensive Knowledge Base, with over 1,200 authors and 100 scientific advisors

+ More than 10,000 articles and 1,000 how-to sheets, over 800 new or updated articles every year

From design to prototyping, right through to industrialization, the reference for securing the development of your industrial projects

This article is included in

Software technologies and System architectures

This offer includes:

Knowledge Base

Updated and enriched with articles validated by our scientific committees

Services

A set of exclusive tools to complement the resources

Practical Path

Operational and didactic, to guarantee the acquisition of transversal skills

Doc & Quiz

Interactive articles with quizzes, for constructive reading

Subscribe now!

Ongoing reading
Knowledge extraction from data (KDE)

Adding value to data

Bibliography

- Dans cette bibliographie, nous avons essentiellement inséré les ouvrages de base. Les articles de revues ou des conférences ont été explicitement écartés. On peut trouver sur Internet des bibliographies assez larges sur les différents sujets.