Overview
FrançaisABSTRACT
This article presents the challenging field of evaluating and improving the quality of data. It describes solutions from research and the main approaches implemented in practice to manage data quality problems such as incorrect or erroneous data, missing or incomplete data, duplicate records, and obsolete, inconsistent or outlying data. The main techniques for diagnosis and correction are presented to enable modeling, measurement, control and improvement of data quality in structured data management and warehousing systems.
Read this article from a comprehensive knowledge base, updated and supplemented with articles reviewed by scientific committees.
Read the articleAUTHOR
-
Laure BERTI-ÉQUILLE: Professor - Computer Science and Systems Laboratory - Aix-Marseille University - Marseille, France
INTRODUCTION
Problems with the quality of data stored in databases and data warehouses are endemic to all types of data (structured or unstructured) and in all fields of application: Web data, government, commercial, industrial or scientific data. These include data errors, duplications, inconsistencies, missing, incomplete, uncertain, outdated, aberrant or unreliable values. The consequences of non-quality (or poor quality) data on decision-making, and the financial costs involved, are considerable: the Gartner Institute estimates that over 25% of critical data in the world's largest companies is inaccurate. Data quality problems cost the global economy millions every year. As a result, in most development projects focused on data use and analysis, data cleansing accounts for between 30% and 80% of the overall budget and development time dedicated to improving data quality, rather than to building the system or software .
In the era of "Big Data", the sources of information available are multiplying, and the volumes of data potentially accessible are increasing exponentially. Data quality and, more broadly, the quality of information and its veracity have taken on major importance, not only within companies and in the academic world, but also for the general public, the primary consumer and producer of online information. It has become essential to know the quality of the data produced and used, both to adapt its use and to prevent it from deteriorating.
Continuously assessing the quality of data stored in information systems, databases and data warehouses, or on the Web, has become crucial, as it is :
provide users with objective measures and critical expertise of data quality that can be used for decision-making;
enable them to put the trust they might place in data into perspective, so that they use or analyze it with care.
If analysis and decision-making can be based on data that is inaccurate, incomplete, ambiguous and of poor quality, then we can question the meaning to be given to these results, and rightly question the quality of the knowledge thus "elaborated".
This article presents a range of solutions to the many problems associated with data quality. The aim of this summary is...
Exclusive to subscribers. 97% yet to be discovered!
You do not have access to this resource.
Click here to request your free trial access!
Already subscribed? Log in!
The Ultimate Scientific and Technical Reference
KEYWORDS
metadata | data quality | data science | anomaly detection | data cleaning | database repairing | data quality indicators
This article is included in
Software technologies and System architectures
This offer includes:
Knowledge Base
Updated and enriched with articles validated by our scientific committees
Services
A set of exclusive tools to complement the resources
Practical Path
Operational and didactic, to guarantee the acquisition of transversal skills
Doc & Quiz
Interactive articles with quizzes, for constructive reading
Data quality
Bibliography
Events
International conferences :
Very Large Databases (VLDB) Conference : http://vldb.org/conference.html
ACM Special Interest Group on Management of Data SIGMOD : https://dl.acm.org/event.cfm?id=RE227
...
Standards and norms
- Data quality – Part 1 : Overview https://www.iso.org/standard/50798.html - ISO/TS 8000-1 - 2011
- Data quality – Part 2: Vocabulary https://www.iso.org/standard/73456.html - ISO 8000-2 - 2017
- Data quality – Part 8: Information and data quality: Concepts and measuring https://www.iso.org/standard/60805.html - ISO 8000-8 - 2015
- Data quality – Part 61: Data quality management : Process reference model...
Exclusive to subscribers. 97% yet to be discovered!
You do not have access to this resource.
Click here to request your free trial access!
Already subscribed? Log in!
The Ultimate Scientific and Technical Reference