Article | REF: BIO7055 V1

Querying and Managing bioinformatic data for molecular biology

Authors: Sarah COHEN-BOULAKIA, Patrick VALDURIEZ

Publication date: November 10, 2015

You do not have access to this resource.
Click here to request your free trial access!

Already subscribed? Log in!


Overview

Français

ABSTRACT

The masses of bioinformatics data available on the Web for molecular biology are constantly growing. Accessing and conjointly making use of such data is imperative for new discoveries in biology. The purpose of this paper is to give the reader all the necessary pointers to identify bioinformatics reference databases for molecular biology, familiarize the reader with the problems raised by the joint use of these distributed and highly heterogeneous data, sketch a panorama of systems offering unified data access and guide users in choosing a system that will meet their needs.

Read this article from a comprehensive knowledge base, updated and supplemented with articles reviewed by scientific committees.

Read the article

AUTHORS

  • Sarah COHEN-BOULAKIA: Senior Lecturer HDR - Doctorate from Université Paris Sud - Inria, Institute of Computational Biology, Montpellier, France - Laboratoire de recherche en informatique, CNRS UMR 8623 Université Paris Sud, Orsay, France

  • Patrick VALDURIEZ: Research Director - Doctorate from Paris 6 University - Inria, LIRMM, Institute of Computational Biology, Montpellier, France

 INTRODUCTION

Molecular biology is a discipline that studies the mechanisms of living organisms at the molecular level: understanding the mechanisms governing cell activity, determining the functional role of a group of proteins or identifying a set of genes involved in a disease. Advances in knowledge of molecular biology are closely linked to progress in multiple fields: biology, chemistry, physics, electronics, mathematics and computer science.

Since the early 1990s, new technologies have emerged, such as high-throughput analysis techniques. These technologies generate an extremely large amount of data. In this context, the size of a genome corresponds to the quantity of DNA contained in one copy of the genome, measured in number of nucleotides (with the unit megabase, one million nucleotides). In 2015, sequencing techniques enabled a single machine to sequence 200 human genomes in a week, at a cost of $0.03 per megabase, whereas the Human Genome Project took 12 years to sequence the first human genome, involving hundreds of laboratories and costing an estimated $10,000 per megabase.

Since the early 2010s, many laboratories have been equipped with this type of machine. As a result, between 2010 and 2015, the volume of sequencing data generated doubled every five months.

What's more, the data generated in this way do not, on their own, enable us to understand the various mechanisms of living organisms. They are referred to as "raw data". Other analyses must then be carried out to complete them, not just by conventional biological experimental analyses, but by computer analyses, once again generating very large volumes of bioinformatics data.

All the raw data and the results of their analysis are stored in biological databases, available (more often than not) on the web. The number and content of these databases are growing considerably. These rapidly evolving databases are both distributed across the web and highly heterogeneous: each database has its own data format and structure, the data they contain reflect different areas of expertise, and the scientific terms used to describe the data often differ from one database to another. Nevertheless, they contain a wealth of information and are therefore highly complementary.

The ability to interrogate, compare and reconcile bioinformatics data is essential to the advancement of knowledge in molecular biology. Exploiting this volume and diversity of distributed, highly heterogeneous and constantly evolving information is a real challenge.

In this article, our aim is to provide an overview of the current state of the art in bioinformatics databases for molecular biology, and above all to offer guidance on how to choose the right solution...

You do not have access to this resource.

Exclusive to subscribers. 97% yet to be discovered!

You do not have access to this resource.
Click here to request your free trial access!

Already subscribed? Log in!


The Ultimate Scientific and Technical Reference

A Comprehensive Knowledge Base, with over 1,200 authors and 100 scientific advisors
+ More than 10,000 articles and 1,000 how-to sheets, over 800 new or updated articles every year
From design to prototyping, right through to industrialization, the reference for securing the development of your industrial projects

KEYWORDS

Information retrieval   |   Public bioinformatic Databases   |   Standard and systems for managing and querying bioinformatic data


This article is included in

Bioprocesses and bioproductions

This offer includes:

Knowledge Base

Updated and enriched with articles validated by our scientific committees

Services

A set of exclusive tools to complement the resources

Practical Path

Operational and didactic, to guarantee the acquisition of transversal skills

Doc & Quiz

Interactive articles with quizzes, for constructive reading

Subscribe now!

Ongoing reading
Querying and managing bioinformatics data for molecular biology