Overview
ABSTRACT
The huge data sets of many modern applications and hardware techniques such as 3D chip stacking in HBT DRAMs have given new momentum to in-memory or near-memory computing. The article presents the corresponding issues: computation localization, computation quantity, coordination between CPU and in /near memory coprocessor. Five significant recent examples are presented and discussed: the Untether AI Bocqueria chip, the Cerebras WCS-2 chip, the Ambit project, the UPMEM PIM chip and the Samsung Aquabolt-XL chip.
Read this article from a comprehensive knowledge base, updated and supplemented with articles reviewed by scientific committees.
Read the articleAUTHOR
-
Daniel ETIEMBLE: Engineer from INSA Lyon - Professor Emeritus, Université Paris Saclay
INTRODUCTION
For several decades now, the gap between processor and DRAM memory performance, known as the "memory wall", has been growing steadily. Various techniques are used to limit the growth of this gap:
cache hierarchies, to bring instructions and processor data closer together;
hardware multithreading to limit memory requirements ;
increasing DRAM throughput with successive generations: DDR, GDDR, HBM.
Bringing calculations closer to memory data is a technique that has been studied since the 1960s. Implementations such as Vector IRAM were proposed in the 1990s. In-memory or near-memory computing is becoming more topical as a result of two phenomena:
Many modern applications use huge data sets. Minimizing transfers between CPU and DRAM main memory is becoming a must.
Hardware circuit design techniques, such as the 3D stacking of chips in DRAM HBM (High Bandwidth Memory), facilitate computing close to DRAM memories.
Calculating near or in memory raises a number of questions:
Where to calculate?
How much calculation is required?
How do you organize coordination between the master CPU and the hardware gas pedal in or near memory?
These questions are detailed.
Five recent examples are discussed:
The Untether AI Boqueria architecture is an accelerator for neural network inference. It consists of a 2D grid of 729 SRAM blocks, each block comprising 512 SRAMs of 640 bytes and 512 elementary processors. Calculations are close to SRAM.
The Celebras WS2 circuit is a deep learning wafer with 850,000 cores (2.6 . 10 12 transistors). The cores, interconnected in a 2D grid at wafer level, have a 50:50 ratio of logic (computation) and SRAM memory.
The Ambit project modifies the internal structure of a DRAM to perform a number of basic operations: copy, Not, And, Or, etc.
UPMEM has designed and tested PIM chips comprising a processor based on DRAM technology with a complete instruction set for full computation, without floats or SIMD instructions, alongside DRAM memory banks. Calculations are performed close to the DRAM memory banks.
Samsung's Aquabolt-XL circuit stacks DRAM chips using TSV technology and inserts chips with computing units between the memory banks into the stack. The calculation...
Exclusive to subscribers. 97% yet to be discovered!
You do not have access to this resource.
Click here to request your free trial access!
Already subscribed? Log in!
The Ultimate Scientific and Technical Reference
KEYWORDS
memory wall | processing in memory | processing near memory
CAN BE ALSO FOUND IN:
This article is included in
Electronics
This offer includes:
Knowledge Base
Updated and enriched with articles validated by our scientific committees
Services
A set of exclusive tools to complement the resources
Practical Path
Operational and didactic, to guarantee the acquisition of transversal skills
Doc & Quiz
Interactive articles with quizzes, for constructive reading
Calculation in or near memory
Bibliography
- (1) - SING (G.) et al - Near-Memory Computing: Past, Present, and Future - arXiv, 2019 https://arxiv.org/pdf/1908.02640.pdf
- (2) - PATERSON (D.) et al - A case for Intelligent...
Exclusive to subscribers. 97% yet to be discovered!
You do not have access to this resource.
Click here to request your free trial access!
Already subscribed? Log in!
The Ultimate Scientific and Technical Reference