Article | REF: H1210 V1

Coding numbers in computers

Author: Daniel ETIEMBLE

Publication date: November 10, 2023

You do not have access to this resource.
Click here to request your free trial access!

Already subscribed? Log in!


Overview

Français

ABSTRACT

Since processors only operate with binary digits (bits), coding is necessary to represent the different types of numbers.  Integer and Floating-point formats handled by general-purpose processors are presented along with the basic arithmetic operations and how they are implemented in the processors' instruction sets. More specific formats (fixed point, decimal, reduced floats, Posit numbers) are also presented and discussed.

Read this article from a comprehensive knowledge base, updated and supplemented with articles reviewed by scientific committees.

Read the article

AUTHOR

  • Daniel ETIEMBLE: Engineer from INSA Lyon - Professor Emeritus, Université Paris Saclay

 INTRODUCTION

Since processors and digital electronic systems in general operate solely on binary digits called bits, number processing requires coding.

The different representations involve several aspects:

  • format presentation ;

  • basic arithmetic operations (addition, subtraction, multiplication, division) with the possible problems of overflow;

  • instruction sets for the various operations, with variations depending on the instruction set.

The n-bit integer formats represent unsigned or signed integers. While positive numbers always have the same representation, different representations of negative numbers have been defined: sign and absolute value, complement to 1, complement to 2, the latter being the only one used for decades. For integer operations, the number of output bits is different from the number of input bits. An n-bit + n-bit addition produces an n+1-bit result, and an n-bit * n-bit multiplication produces a 2n-bit result. Dealing with carry (addition) or n most significant bits (multiplication) poses problems for both scalar instructions and SIMD instructions in the various instruction sets.

Single-precision (32-bit) and double-precision (64-bit) float formats have been standardized since the mid-1980s (IEEE 754). They are presented here, along with the more recent reduced 16-bit and 8-bit formats used in deep neural networks. Block float and decimal float formats are also presented. The Posit format, proposed by J.L. Gustafson as an alternative to IEEE 754 float formats, is presented and discussed.

While integer and float formats have been used in general-purpose processors for decades, computational models such as neural networks and energy consumption issues have led to the emergence of reduced formats that can be added to certain general-purpose instruction sets, or implemented in specialized processors, IPs, FPGAs and so on.

You do not have access to this resource.

Exclusive to subscribers. 97% yet to be discovered!

You do not have access to this resource.
Click here to request your free trial access!

Already subscribed? Log in!


The Ultimate Scientific and Technical Reference

A Comprehensive Knowledge Base, with over 1,200 authors and 100 scientific advisors
+ More than 10,000 articles and 1,000 how-to sheets, over 800 new or updated articles every year
From design to prototyping, right through to industrialization, the reference for securing the development of your industrial projects

KEYWORDS

integer format   |   floating point format   |   fixed point format   |   arithmetic operations   |   BCD   |   16 and 8 bits floats   |   Posit numbers


This article is included in

Software technologies and System architectures

This offer includes:

Knowledge Base

Updated and enriched with articles validated by our scientific committees

Services

A set of exclusive tools to complement the resources

Practical Path

Operational and didactic, to guarantee the acquisition of transversal skills

Doc & Quiz

Interactive articles with quizzes, for constructive reading

Subscribe now!

Ongoing reading
Number coding in computers