Overview
Read this article from a comprehensive knowledge base, updated and supplemented with articles reviewed by scientific committees.
Read the articleAUTHORS
-
Philippe BESSE: Professor at INSA Toulouse - Toulouse Institute of Mathematics
-
Alain BACCINI: Former professor at Paul Sabatier University (Toulouse 3) - Institut de Mathématiques de Toulouse
INTRODUCTION
Data analysis techniques, or more precisely, multi-dimensional exploratory statistics, are aimed at the descriptive study of large tables: n rows, or individuals, or statistical units, where n varies from a few tens to a few thousands, or even millions, p columns, or statistical variables, where p varies from a few tens to a few thousands. This objective is achieved by producing synthetic graphs and indicators that summarize the structures and main characteristics of these large tables. The methods proposed are therefore descriptive techniques for the study of a large number of variables and individuals; they complement elementary one- or two-dimensional statistical tools and are often a prerequisite for modeling or an inferential, decisional or predictive approach to the data studied.
The development of technological means of measurement is at the origin of ever-growing data flows, the storage and analysis of which are made possible by the joint development of computing resources. The objectives and fields of application of statistical data mining are many and varied. Let's take a look at a few examples of how this exploration can be of interest in different sectors:
in the industrial sector (agri-food, microelectronics, mechanical engineering, etc.), where process monitoring and product traceability automatically generate considerable data flows. Statistical exploration is a prerequisite for any modeling research, for example, for the implementation of statistical process control (SPC) or failure detection;
upstream, in research and development, where needs are just as great: virtual screening of molecules in the pharmaceutical industry, sensiometry in the agri-food industry, not to mention the considerable boom in post-genomic biotechnologies with transcriptomic and proteomic data... ;
in the tertiary sector (banking, insurance, mail order, telephone operators, etc.) and services, where huge customer files are searched (data mining) for marketing purposes, with the aim of personalizing customer relationship management.
Exclusive to subscribers. 97% yet to be discovered!
You do not have access to this resource.
Click here to request your free trial access!
Already subscribed? Log in!
The Ultimate Scientific and Technical Reference
CAN BE ALSO FOUND IN:
This article is included in
Software technologies and System architectures
This offer includes:
Knowledge Base
Updated and enriched with articles validated by our scientific committees
Services
A set of exclusive tools to complement the resources
Practical Path
Operational and didactic, to guarantee the acquisition of transversal skills
Doc & Quiz
Interactive articles with quizzes, for constructive reading
Data analysis or multidimensional exploratory statistics
Bibliography
Websites
Other resources (handouts, practical exercises, functions written in R) are available on the website :
https://www.math.univ-toulouse.fr/
R Development Core TeamR: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing
Find out more
The most useful general and introductory references for this theme are: Bouroche & Saporta (1980), Jobson (1992), Lebart, Morineau & Piron (2006), Mardia, Kent & Bibby (1979), Saporta (2006). More recent additions and developments can be found in: Droesbeke, Fichet & Tassi (1992), Govaert (2003).
Exclusive to subscribers. 97% yet to be discovered!
You do not have access to this resource.
Click here to request your free trial access!
Already subscribed? Log in!
The Ultimate Scientific and Technical Reference