Overview
ABSTRACT
The purpose of this paper is to define the term big data and the technologies and issues associated with it. We begin by characterizing what big data is and describing its uses in various fields. Next, the various solutions for storing big data are presented, from SQL and NoSQL databases to cloud computing. The second part is devoted to the analysis and mining of big data, particularly through the prism of the latest advances in machine learning and artificial intelligence.
Read this article from a comprehensive knowledge base, updated and supplemented with articles reviewed by scientific committees.
Read the articleAUTHORS
-
Bernard ESPINASSE: University Professors - Aix-Marseille University, CNRS (LIS UMR 7020) - Ecole Polytechnique Universitaire de Marseille, - Marseille, France.
INTRODUCTION
The aim of this article is to define the term Big Data, to explain the economic and societal issues involved, and to introduce the various methods and techniques involved. Although the origin of the term Big Data is controversial, it is believed to have appeared in 1997, and its recommended official French translation is "mégadonnées", although it is sometimes referred to as "données massives". Recently, the spread of approaches and applications exploiting neural networks has tended to replace this term in the collective imagination with those of Data science, Artificial Intelligence or Machine Learning, and even if each refers to different fields, the engineering and research questions common to them are numerous: they concern the storage, management, processing, analysis and exploitation (uses) of very large quantities of data, and the opportunities and risks associated with them.
For at least the last thirty years, the amount of data generated has only increased. Currently, we produce an estimated 74 zettabytes of data per year, equivalent to more than 1 GB per hour per inhabitant of the planet. By 2025, this amount is set to more than double . This increase in data affects all sectors, whether scientific, cultural, industrial or financial.
The global average annual growth rate of the market for technology and services around Big Data was estimated at over 30% over the period 2011-2016, and has remained at around 20% ever since. According to an IDC study, this market reached $23.8 billion in 2016 and $90 billion in 2021 for BigData software and other cloud services alone. IDC also estimates that spending on Big Data and business analytics (BDA) solutions in Europe will reach $50 billion by 2022.
In this article, we focus on two major issues associated with megadata: on the one hand, its storage, and on the other, its analysis and exploitation using statistical and machine learning approaches, while identifying the limitations of traditional and historical approaches. Megadata is mainly accompanied by the development of analytical or predictive applications, which process data to make sense of it, classify, search or filter it, or make estimates of future states or values. These analyses are generally referred to as "Big Analytics", and rely on distributed and parallel computing methods, which are often costly in terms of computing time and energy, and...
Exclusive to subscribers. 97% yet to be discovered!
You do not have access to this resource.
Click here to request your free trial access!
Already subscribed? Log in!
The Ultimate Scientific and Technical Reference
KEYWORDS
big data | cloud computing | big data | analytics | machine learning | storage | data mining | NoSQL
EDITIONS
Other editions of this article are available:
CAN BE ALSO FOUND IN:
Home Industrial engineering Industry of the future Introduction to Big Data : storage, big analytics and data mining
Home IT Software technologies and System architectures Introduction to Big Data : storage, big analytics and data mining
Home IT Digital documents and content management Introduction to Big Data : storage, big analytics and data mining
This article is included in
Smart cities
This offer includes:
Knowledge Base
Updated and enriched with articles validated by our scientific committees
Services
A set of exclusive tools to complement the resources
Practical Path
Operational and didactic, to guarantee the acquisition of transversal skills
Doc & Quiz
Interactive articles with quizzes, for constructive reading
Introduction to Big-Data — megadata storage, analysis and mining
Bibliography
- (1) - TAYLOR (P.) - Volume of data/information created, captured, copied, and consumed worldwide from 2010 to 2020, with forecasts from 2021 to 2025. Statista, Nov 2023. https://www.statista.com/statistics/871513/worldwide-data-created/ ...
Websites
Mahout https://mahout.apache.org
BERTopic https://maartengr.github.io/BERTopic
Gargantext https://gargantext.org
Exclusive to subscribers. 97% yet to be discovered!
You do not have access to this resource.
Click here to request your free trial access!
Already subscribed? Log in!
The Ultimate Scientific and Technical Reference