AltDevArts

AltDevArts

Share this post

AltDevArts
AltDevArts
Two's-complement Fixed-point Basic Math
Copy link
Facebook
Email
Notes
More
User's avatar
Discover more from AltDevArts
Sharing what we have learned over the years, in the art of development.
Already have an account? Sign in

Two's-complement Fixed-point Basic Math

Fixed-point values are fractions. Basic math with fixed-point values is basic math with fractions.

Mike Acton's avatar
Mike Acton
Sep 11, 2023
2

Share this post

AltDevArts
AltDevArts
Two's-complement Fixed-point Basic Math
Copy link
Facebook
Email
Notes
More
Share

This is part 8 of the onboarding floating point series. This series is intended to be used for onboarding of programmers new to the team to review a basic understanding of fixed-point and floating-point number formats, or for programmers who would like to remove some of the mystery from formats they may use everyday.

For this example we’ll use a signed 16 bit integer with 10 bits fractional part. As was introduced in fixed-point as fractions, the value stored is the numerator of the fraction:

n1024

Some example values:

Addition and Subtraction

Addition of fixed-point values stored as two’s-complement is not different from any other two’s-complement integer addition.

c1024=a1024+b1024
c1024=a+b1024
c=a+b
int16_t 
add( int16_t a, int16_t b)
{
  return a+b;
}
EXERCISE 8-1: Create a version of add with saturate. 

  * Overflow: If the sum would be greater than the largest representable value, the largest representable value is returned.
  * Underflow: If the sum would be smaller than the smallest representable value, the smallest representable value is returned.

Subtraction can be implemented as integer subtract:

c1024=a1024−b1024
c1024=a−b1024
c=a−b
int16_t 
sub( int16_t a, int16_t b)
{
  return a-b;
}

Alternatively, subtraction can be implemented in terms of addition.

int16_t sub( int16_t a, int16_t b)
{
  return add(a,-b);
}

Multiplication

c1024=a1024â‹…b1024
c1024=aâ‹…b1024â‹…1024
c=aâ‹…b1024

The sub-expression (a*b) requires a 32 bit result, as two 16 bit multiples can produce a 32 bit result. That result is then reduced by 10 bits (divided by 1024), however that still leaves potentially up to a 22 bit result, so the result must still be returned as a 32 bit value.

int32_t 
mul( int16_t a, int16_t b)
{
  return (a*b)/1024;
}

If using right shift instead of divide with signed integers, values will be rounded up when negative. To truncate instead (round toward zero):

int32_t 
mul( int16_t a, int16_t b )
{
  int32_t c_pos   = a*b;
  int32_t c_neg   = c_pos+1023;
  int32_t c_sign  = c_pos >> 31;
  int32_t c       = (c_sign & c_neg) | ((~c_sign) & c_pos);
  return c >> 10;
}

Another common approach is to round half away from zero:

int32_t 
mul( int16_t a, int16_t b )
{
  int32_t c_mul   = a*b;
  int32_t c_pos   = c_mul + 512;
  int32_t c_neg   = c_pos + 1023;
  int32_t c_sign  = c_mul >> 31;
  int32_t c       = (c_sign & c_neg) | ((~c_sign) & c_pos);
  return c >> 10;
}
EXERCISE 8-2: Implement mul() with round half to even, which is just like round half away from zero except that on exactly 1/2 boundaries, the nearest even value is chosen. e.g.
 3 * 1/2 =  1 1/2 =  2
 5 * 1/2 =  2 1/2 =  2
 7 * 1/2 =  3 1/2 =  4
 9 * 1/2 =  4 1/2 =  4
-3 * 1/2 = -1 1/2 = -2
-5 * 1/2 = -2 1/2 = -2
-7 * 1/2 = -3 1/2 = -4
-9 * 1/2 = -4 1/2 = -4

Absolute value

The absolute value of fixed-point values stored as signed integers is the same as with any other signed integer:

int16_t
int16_abs( int16_t a )
{
    return (a<0)?(-a):a;
}

Or alternatively, because it’s in two’s complement:

int16_t
int16_abs( int16_t a )
{
  int16_t a0 = a >> 15;
  int16_t a1 = a ^ a0;
  int16_t a2 = a1 - a0;
  return a2;
}

Division

c1024=a1024b1024
c1024=a1024â‹…1024b
c1024=ab
c=1024â‹…ab

The sub-expression (a*1024) requires a 32 bit result, and if b is less than 1024, the final result may still not fit in 16 bits.

int32_t 
div( int16_t a, int16_t b)
{
  return (a<<10)/b;
}

When using round half away from zero, since negative and positive values are mirrored, unsigned division can be used:

int32_t
div( int16_t a, int16_t b )
{
  int32_t  sign = (a ^ b) >> 15;              // sign
  uint32_t ua   = int16_abs(a);               // magnitude(a)
  uint32_t ub   = int16_abs(b);               // magnitude(b)
  uint32_t uc   = ((ua << 10)+(ub>>1)) / ub;  // div round half from zero
  int32_t  c    = (uc + sign)^sign;           // to two's complement
  return c;
}
EXERCISE 8-3: Compare and contrast the approach to calculating round half from zero in mul() and div().
REFERENCE: See libfixmath for implemention of Q16.16 format fixed point operations in C.

Next: Part 9

Floating-point Addition and Subtraction - Floating-point values are fractions. Addition and subtracting floating-point numbers is adding and subtracting fractions.

Thank you for reading AltDevArts. This post is public so feel free to share it.

Share

Get 10% off a group subscription

Ben Elk's avatar
paulo's avatar
2 Likes
2

Share this post

AltDevArts
AltDevArts
Two's-complement Fixed-point Basic Math
Copy link
Facebook
Email
Notes
More
Share

Discussion about this post

User's avatar
Port a Unity Game to Your Own Engine: Part 1
A concrete look into a data-oriented approach to engine development
Oct 20, 2023 â€¢ 
Mike Acton
53

Share this post

AltDevArts
AltDevArts
Port a Unity Game to Your Own Engine: Part 1
Copy link
Facebook
Email
Notes
More
2
Onboarding floating-point
Floating-point is often misunderstood, creating bugs and issues that are difficult to diagnose without some understanding of the underlying format.
Sep 11, 2023 â€¢ 
Mike Acton
24

Share this post

AltDevArts
AltDevArts
Onboarding floating-point
Copy link
Facebook
Email
Notes
More
A simple main game loop with ncurses
Sometimes a text terminal is good enough.
Sep 18, 2023 â€¢ 
Mike Acton
19

Share this post

AltDevArts
AltDevArts
A simple main game loop with ncurses
Copy link
Facebook
Email
Notes
More
1

Ready for more?

© 2025 Matt Yaney
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share

Copy link
Facebook
Email
Notes
More

Create your profile

User's avatar

Only paid subscribers can comment on this post

Already a paid subscriber? Sign in

Check your email

For your security, we need to re-authenticate you.

Click the link we sent to , or click here to sign in.