Article | REF: H6040 V2

Introduction to Big Data : storage, big analytics and data mining

Authors: Patrice BELLOT, Bernard ESPINASSE

Publication date: February 10, 2024

You do not have access to this resource.
Click here to request your free trial access!

Already subscribed? Log in!


Overview

Français

ABSTRACT

The purpose of this paper is to define the term big data and the technologies and issues associated with it. We begin by characterizing what big data is and describing its uses in various fields. Next, the various solutions for storing big data are presented, from SQL and NoSQL databases to cloud computing.  The second part is devoted to the analysis and mining of big data, particularly through the prism of the latest advances in machine learning and artificial intelligence.

Read this article from a comprehensive knowledge base, updated and supplemented with articles reviewed by scientific committees.

Read the article

AUTHORS

  • Patrice BELLOT

  • Bernard ESPINASSE: University Professors - Aix-Marseille University, CNRS (LIS UMR 7020) - Ecole Polytechnique Universitaire de Marseille, - Marseille, France.

 INTRODUCTION

The aim of this article is to define the term Big Data, to explain the economic and societal issues involved, and to introduce the various methods and techniques involved. Although the origin of the term Big Data is controversial, it is believed to have appeared in 1997, and its recommended official French translation is "mégadonnées", although it is sometimes referred to as "données massives". Recently, the spread of approaches and applications exploiting neural networks has tended to replace this term in the collective imagination with those of Data science, Artificial Intelligence or Machine Learning, and even if each refers to different fields, the engineering and research questions common to them are numerous: they concern the storage, management, processing, analysis and exploitation (uses) of very large quantities of data, and the opportunities and risks associated with them.

For at least the last thirty years, the amount of data generated has only increased. Currently, we produce an estimated 74 zettabytes of data per year, equivalent to more than 1 GB per hour per inhabitant of the planet. By 2025, this amount is set to more than double . This increase in data affects all sectors, whether scientific, cultural, industrial or financial.

The global average annual growth rate of the market for technology and services around Big Data was estimated at over 30% over the period 2011-2016, and has remained at around 20% ever since. According to an IDC study, this market reached $23.8 billion in 2016 and $90 billion in 2021 for BigData software and other cloud services alone. IDC also estimates that spending on Big Data and business analytics (BDA) solutions in Europe will reach $50 billion by 2022.

In this article, we focus on two major issues associated with megadata: on the one hand, its storage, and on the other, its analysis and exploitation using statistical and machine learning approaches, while identifying the limitations of traditional and historical approaches. Megadata is mainly accompanied by the development of analytical or predictive applications, which process data to make sense of it, classify, search or filter it, or make estimates of future states or values. These analyses are generally referred to as "Big Analytics", and rely on distributed and parallel computing methods, which are often costly in terms of computing time and energy, and...

You do not have access to this resource.

Exclusive to subscribers. 97% yet to be discovered!

You do not have access to this resource.
Click here to request your free trial access!

Already subscribed? Log in!


The Ultimate Scientific and Technical Reference

A Comprehensive Knowledge Base, with over 1,200 authors and 100 scientific advisors
+ More than 10,000 articles and 1,000 how-to sheets, over 800 new or updated articles every year
From design to prototyping, right through to industrialization, the reference for securing the development of your industrial projects

This article is included in

Digital documents and content management

This offer includes:

Knowledge Base

Updated and enriched with articles validated by our scientific committees

Services

A set of exclusive tools to complement the resources

Practical Path

Operational and didactic, to guarantee the acquisition of transversal skills

Doc & Quiz

Interactive articles with quizzes, for constructive reading

Subscribe now!

Ongoing reading
Introduction to Big-Data — megadata storage, analysis and mining