We have already learned to store both positive and negative integers. Now let’s look at fractional numbers.
Fractional numbers, or as we commonly call them “decimal numbers”, are those that do not correspond to whole parts (for example, 0.5, 2.31, 7.353, are fractional numbers).
Representing fractional numbers is more complicated than representing positive or negative integers. So we have different forms of representation, with the most common being floating-point.
Fixed Point Representation
One of the simplest ways to represent decimal numbers in binary is the fixed point method. In this approach, a fixed number of bits is assigned to the integer part and another fixed number to the fractional part of the number.
For example, in an 8-bit system with 4 bits for the integer part and 4 for the decimal part, the number 5.75
would be represented as 0101.1100
.
This technique is straightforward and easy to implement, but it has many limitations. The precision is limited by the number of bits dedicated to the fractional part.
Additionally, it is not dynamic and cannot handle numbers that exceed the range defined by the assigned bits.
Floating Point Representation
The floating point method is the de facto standard for representing decimal numbers in most modern computers. This method allows greater flexibility and precision when handling numbers of different magnitudes.
The IEEE 754 standard is the most widely used for the representation of floating point numbers.
In floating point representation, a number is divided into three parts: the sign, the exponent, and the mantissa.
- The sign indicates whether the number is positive or negative.
- The exponent determines where the decimal point is located.
- The mantissa is the fractional part of the number.
For example, the number 5.75
in 32-bit floating point format would be:
0 10000001 01110000000000000000000
- Sign:
0
(positive) - Exponent:
10000001
(129 in decimal) - Mantissa:
01110000000000000000000
This method allows for handling numbers of vastly different magnitudes by adjusting the exponent. However, it also has its limitations, especially in terms of precision for very small or very large numbers.
The representation is more complex than fixed point representation and requires greater computational cost. But, in return, it allows us to cover a huge amount of numbers.
Precision Issues
Despite its versatility, the representation of decimal numbers in floating point can lead to precision issues. This is because we are not actually encoding a number but rather “a very close number”.
For example, the decimal number 0.1 cannot be represented precisely in binary and results in a periodic representation in binary.
In floating point, that number is
- Sign:
0
- Exponent: 123 in decimal
- Mantissa: 5033165
That is, the number you are actually representing is not 0.1, but
0.100000001490116119384765625
It is a small difference, but it generates many problems and apparent contradictions when programming.
Real examples of “weird things” that can happen to you:
That adding ten times 0.1 and subtracting 1.0 does not equal zero.
0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1 - 1.0 = −2.77556×10^−17
Or that subtracting these two numbers does not yield 0.00001, which would be the “normal” result.
1.00001 - 1.0 = 0.00000999999
Or even that the result is different depending on the order in which you perform the operations.
(0.1 + 0.2) + 0.3 = 0.6000000000000001
0.1 + (0.2 + 0.3) = 0.6
That is, floating point numbers should be treated with caution. It’s not that they are bad, it’s that you need to understand how they work to know how to work with them.
Other Representations
There are other much less common techniques, but equally interesting for representing decimal numbers in binary. Some of these include:
Midpoint Notation: In this technique, a number is represented as the sum of two fixed point numbers. This can be useful in situations where high precision is required and range is not a primary concern.
Fixed Point Method: Similar to fixed point, but with a variable number of bits for the fractional part. This can allow for greater precision for certain numbers, but at the cost of flexibility in range.
Normalized Floating Point: A variant of floating point that ensures the most significant bit of the mantissa is always
1
, which improves precision and range compared to standard floating point.