Onboarding floating-point
Floating-point is often misunderstood, creating bugs and issues that are difficult to diagnose without some understanding of the underlying format.
This series serves as a resource for programmers who are new to the team, aiming to enhance their foundational knowledge of floating-point number formats. It may also be useful to experienced programmers wanting to demystify these formats, which they encounter regularly in their work.
Floating-point formats are often misunderstood, leading to elusive bugs and challenges that require comprehension of the underlying format to resolve.
In games, a solid grasp of these common data formats is essential for making informed decisions about complex structures, optimizing performance, and data compression. Additionally, it is vital for understanding past decisions and assessing their current relevance.
Fixed-point and floating-point formats share fundamental principles, making it imperative to delve into fixed-point as well to gain a comprehensive understanding of floating-point.
While this series should not be considered a comprehensive reference (many details and interesting cases are left to further reading), it does hopefully offer sufficient introduction to help comprehend trade-offs, troubleshoot issues, and tackle more intricate transformations.
Several exercises are included. Nothing helps understanding as well as hands-on practice, so it’s recommended you work the exercises if you want to get the most out of this series.
Table of Contents
Common Integer formats - The details of fixed-point and floating-point formats require an understanding of sign-magnitude, two’s complement, and offset-binary signed number representations.
Fixed-point as fractions - Fixed-point formats describe fractions where both numerator and denominator are integers.
Estimating fixed-point range and resolution - An example of selecting a fixed-point format based on the context.
Fixed-point as compression - Any given fixed-point format can be considered a method of compressing another, higher resolution fixed-point format.
Floating-point as compression - A floating-point format can be considered a (generally lossy) approach to compressing a high-resolution fixed-point format.
Floating-point as fractions - Like fixed-point, floating-point is a format which stores rational numbers as fractions.
Floating-point common names - Some floating-point formats have standard names (float, double, half)
Two's-complement fixed-point Basic Math - Fixed-point values are fractions. Basic math with fixed-point values is basic math with fractions.
Floating-point Addition and Subtraction - Floating-point values are fractions. Addition and subtracting floating-point numbers is adding and subtracting fractions.
Floating-point Multiplication - Floating-point values are fractions. Multiplying floating-point numbers is multiplying fractions.
Floating-point Division - Floating-point values are fractions. Dividing floating-point numbers is dividing fractions.
Floating-point further reading - Recommended reading.