Floating point decimal representation

Floating point decimal representation (for numbers):

-314.159 = -314.159 × 10⁰ ≡ (-314.159, 0) = -31.4159 × 10¹ ≡ (-31.4159, 1) = -3.14159 × 10² ≡ (-3.14159, 2) ^^^^^^^ ^^^ Mantissa Exponent

Floating point decimal reprsentation consists of 2 decimal numbers

A fixed point decimal number representing the mantissa
A integer (whole) decimal number representing the exponent
The base for the exponent is 10₍₁₀₎

The canonical form of the floating point decimal representation

The canonical form:

The floating point representation (mantissa, exponent) where the absolute value of the mantissa is in the range [1₍₁₀₎, 10₍₁₀₎)

Example:

-314.159 = -314.159 × 10⁰ ≡ (-314.159, 0) = -31.4159 × 10¹ ≡ (-31.4159, 1) = -3.14159 × 10² ≡ (-3.14159, 2) Canonical form: (-3.14159, 2)

Floating point binary representation

Floating point binary representation (for numbers):

-101.01011 = -101.01011 × 10⁰ ≡ (-101.01011, 0) = -10.101011 × 10¹ ≡ (-10.101011, 1) = -1.0101011 × 10¹⁰ ≡ (-1.0101011, 10) ^^^^^^^^ ^^^ Mantissa Exponent

Floating point binary reprsentation consists of 2 binary numbers

A fixed point binary number representing the mantissa
A integer (whole) binary number representing the exponent
The base for the exponent is 10₍₂₎ (= 2₍₁₀₎ !!!)

The canonical form of the floating point binary representation

The canonical form:

The floating point representation (mantissa, exponent) where the absolute value of the mantissa is in the range [1₍₂₎, 10₍₂₎)

Example:

-101.01011 = -101.01011 × 10⁰ ≡ (-101.01011, 0) = -10.101011 × 10¹ ≡ (-10.101011, 1) = -1.0101011 × 10¹⁰ ≡ (-1.0101011, 10) Canonical form: (-1.0101011, 10)

A note on the canonical form of the floating point binary representation

Notice that:

The mantissa of canonical form of a floating point binary representation always starts 1.:

Examples:

1.000101 -1.101111

(The only exception is the value 0.0, which will be represented by a special code)

The IEEE 754 standard for floating point binary representation

What is the IEEE 754 standard

The IEEE 754 standard is an international standard that specify how the mantissa and the exponent of a floating point binary representation must be stored
Today's computers store floating point binary representation according to this standard
Wikipedia page: click here

The IEEE 754 format of the single-precision floating point binary repr

Codes used for the mantissa and exponent:

Mantissa: sign-magnitude binary code Exponent: excess 127 binary code

Storage format (uses 32 bits or 4 bytes):

SEEEEEEEEMMMMMMMMMMMMMMMMMMMMMMM Bit: 01 89 31 S = sign of the mantissa (0 = pos, 1 = neg) M = mantissa without the leading 1 (23 bits) E = exponent (8 bits)

Note: the leading 1. in the mantissa is assumed (and omitted) !!!

Example of a IEEE 754 representation (and how to decode it)

Suppose you are given the following IEEE 754 (single precision) representation:

01000000101000000000000000000000

We can find the decimal representation for this IEEE 754 code as follows:

01000000101000000000000000000000 = 0 10000001 01000000000000000000000 ^ ^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^ | exponent mantissa (with leading "1." omitted) sign (= positive mantissa)
Mantissa = +1.01000000000000000000000 Exponent = 10000001 (= 129 = 127 + 2) = +2₍₁₀₎ => Floating point binary repr = +1.01 × 10¹⁰ (binary) = +101.0 (binary) => Decimale repr = 5.0₍₁₀₎

Demo IEEE 754 (single precision) representation

Demo:

/home/cs255001/demo/asm/1-directives/float.s
(Use EGTAPI)

Notice: you write a float number in decimal notation in the source program file.
The assembler translates it into the IEEE 754 representation and stores the binary representation in memory

How does a compiler encodes a decimal representation into IEEE 754 ?

Suppose you are given the decimal number −5.25

How to find the IEEE 754 representation:

1. Encode -5.25 into fixed point binary representation -5 --> 101 0.25 --> 0.01 -5.25₍₁₀₎ = -101.01₍₂₎ 2. Find the canonical form: -1.0101 exponent 10₍₂₎ 3. Code the exponent into Excess 127: 01111111 + 10 = 10000001 4. Put the different parts in their places: SEEEEEEEEMMMMMMMMMMMMMMMMMMMMMMM 11000000101010000000000000000000