Overview
ABSTRACT
After a brief review of the different steps from graphics boards (1980s) to programmable graphics processing units or GPUs (2007), we present the main GPU features. CUDA’s initial release (2007) and the tremendous increase in GPU accelerated scientific codes have resulted in spectacular technological breakthroughs in these processors. We detail them according to their different aspects: software, hardware, memory hierarchies, and techniques to exploit parallelism. They explain the increasing importance of GPUs in numerous applications (scientific calculation, neural networks, imaging, bio-computing, mining of crypto-currency, etc.).
Read this article from a comprehensive knowledge base, updated and supplemented with articles reviewed by scientific committees.
Read the articleAUTHORS
-
Daniel ETIEMBLE: Engineer from INSA Lyon - Professor Emeritus, Université Paris Sud
-
David DEFOUR: Doctorate in Computer Science from ENS Lyon - Senior Lecturer at the University of Perpignan
INTRODUCTION
The year 2007 saw the birth of NVIDIA's CUDA ecosystem, and the period 2007-2017 has seen an explosion in the number of scientific computing codes accelerated by graphics processing units (GPUs). There are currently three major suppliers of graphics processors: AMD, Nvidia and Intel, with different segments: GPUs for workstations and PCs, GPUs for mobile systems and APUs (Accelerated Processor Units), in which CPUs and GPUs are integrated on the same chip.
We briefly review the various stages that led from the graphics card pipeline of the 1980s to the first unified, fully programmable graphics processors in 2007. The operating principle of a GPU is detailed, using the Fermi architecture as an example. The implementation of the SIMT (Single Instruction Multiple Thread) approach is explained. We then look at the various aspects of ten years of technological advances in general-purpose computing on GPUs (GPGPU).
The evolution of market share, GPGPU applications and software developments are presented, including details of the ecosystem providing high-level APIs (close to C) and low-level APIs (close to hardware).
The evolution of hardware is explained, with the different micro-architectural generations, power consumption problems and the contribution of computing units and specialized instructions.
The memory hierarchy and its evolution are detailed, with technological contributions and the simplification introduced by the "unified memory" approach.
Various techniques can be used to improve the exploitation of parallelism, notably in schedulers and hardware parallelism management devices (synchronization and atomic operations).
While retaining their original role as graphics display units, GPUs have become a key player in massively parallel computing. They exploit the fine-grained data parallelism found in a wide range of applications, from high-performance computing to neural networks and the genome. The SIMT execution model gives them a significant advantage over CPUs for massive data parallelism.
Exclusive to subscribers. 97% yet to be discovered!
You do not have access to this resource.
Click here to request your free trial access!
Already subscribed? Log in!
The Ultimate Scientific and Technical Reference
KEYWORDS
CPU | GPU | CUDA | NVIDIA
CAN BE ALSO FOUND IN:
This article is included in
Software technologies and System architectures
This offer includes:
Knowledge Base
Updated and enriched with articles validated by our scientific committees
Services
A set of exclusive tools to complement the resources
Practical Path
Operational and didactic, to guarantee the acquisition of transversal skills
Doc & Quiz
Interactive articles with quizzes, for constructive reading
Fully programmable graphics processors (GPUs)
Bibliography
Norms and standards
- Floating Point Converter - IEEE754 - 2008
Exclusive to subscribers. 97% yet to be discovered!
You do not have access to this resource.
Click here to request your free trial access!
Already subscribed? Log in!
The Ultimate Scientific and Technical Reference