Floating-point as fractions

Like fixed-point, floating-point is a format which stores rational numbers as fractions.

Sep 11, 2023

This is part 6 of the onboarding floating point series. This series is intended to be used for onboarding of programmers new to the team to review a basic understanding of fixed-point and floating-point number formats, or for programmers who would like to remove some of the mystery from formats they may use everyday.

We introduced the s10e5 floating-point format in floating-point as compression. Now let’s look at the same format from a different perspective.

TERMS
-----------------------------------------------------------------------------
| sign  | x15                 | m                                           |
|-------|---------------------|---------------------------------------------|
| 1 bit | 5 bits              | 10 bits                                     |
-----------------------------------------------------------------------------
| sign        | 1 bit value exactly as stored (1=negative, 0=positive)      |
| x15         | 5 bit value exactly as stored as offset-15                  |
| mantissa    | 10 bit value exactly as stored as unsigned integer          |
| m           | mantissa                                                    |                                                
| x           | exponent as signed integer                                  |
| s           | sign as signed integer (-1=negative, 1=positive)            |
| magnitude   | unsigned value as f(x15,m)                                  |

Power-of-two Exponent

The exponent signifies the log2 or power-of-two of the value. Some fractional value (F) is then added to that.

\(value= 2^{x} + F\)

Power-of-Two Intervals

Since we’re working in powers of two, the distance between some exponent x and the next (x+1), is x. The fractional value (F) is some part (f) of that distance.

\(value = 2^{x} + f \cdot 2^{x}\)

Mantissa as Fraction of Interval

The mantissa, occupying 10 bits, is interpreted as with fixed-point.

\(f = \frac {m} {1024}\)

\(value = 2^{x} + \frac {m} {1024} \cdot 2^{x}\)

Special Case Zero

If the exponent in offset-15 is zero, result is a fraction of 1/2^-14

Apply Sign

Apply the sign to create the final result.

Therefore, we can define s10e5 floating point as the fraction:

\(s \cdot \left( 2^{x} + \frac{2^{x}m}{1024} \right)\)

Except where exp-15 is zero:

\(s \cdot \left( \frac{2^{-14}m}{1024} \right)\)

EXERCISE 6-1: Create a table of fractions for the exponents of a s23e8 floating point format. Assume sign is top bit, exponent is stored as offset-127, and mantissa is unsigned.

Log-Linear Fractions

As described above, in a floating point format, the range is divided into power-of-two (logarithmic) segments. However, each segment is linearly divided by the resolution of the mantissa.

EXERCISE 6-2: The s10e5 floating-point format described here is in sign-magnitude form. Create a version of the format that uses two's-complement instead of sign-magnitude and functions which convert from one to the other. Note any differences in values which can be represented.

REFERENCE: Check out this implementation of s23e8 floating-point [6502.org] for the 6502 by Roy Rankin and Steve Wozniak.

Next: Part 7

Floating-point common names - Some floating-point formats have standard names (float, double, half)

Get 10% off a group subscription

AltDevArts

Discussion about this post