Tech Support > Computers & Technology > Programming > Floating Point Problems
Floating Point Problems
Posted by Bruno Christensen on June 30th, 2004


Hi

I have problems with floating point numbers. After the following
statement

double x = 0.75 * 0.1;
double z = 7.5 * 0.01;

x and z have the following values:

0.075000000000000011
0.074999999999999997

My compiler: Microsoft Visual C++ .Net. If I use the old Visual Studio
6.0 everything is ok and x and z have correct values:

0.075000000000000
0.075000000000000

Can anybody help?

Regards
Bruno

Posted by Peter Ammon on June 30th, 2004


Bruno Christensen wrote:

How do you know that's the values they have, instead of the closest
decimal representation of that value?

Well, let's work out the first one. This is my first time doing
floating point math by hand, so bear with me (and correct me if
necessary). First, let's see how the compiler represents .75:

..75 = 1.5 * 2^-1

1.5 is 2^0 + 2^-1, *and all the other terms are zero,* so .75 can be
exactly represented by a double. Enjoy it while you can, since next we
have:

..1 = 1.6 * 2^-4

1.6 in binary is an infinite series: 2^0 + 2^-1 + 2^-4 + 2^-5 + 2^-8 +
2^-9 + 2^-12 + 2^-13 ... + 2^-52 + 2^-53 except oops, we only have 52
bits to store the fraction in, so that 2^-53 and all following terms
have got to go. But all together, those trailing bits make us round up
to get to the nearest representable number, which happens to be

2^0 + 2^-1 + 2^-4 + 2^-5 + 2^-8 + 2^-9 ... + 2^-51, all times 2^-4.

Dying to add all those up? I thought so, but I spared you the trouble:
it's 0.100000000000000005551115123125782702118158340454 1015625. So that
means that .1 is really represented as that big honking number, which is
slightly larger than .1. Give your computer a break, it's doing the
best it can.

Have you guessed the punch line yet? Multiply that big number by .75
and round. You'll get
..075000000000000011102230246251565404236316680908 203125. Notice that it
starts with .075000000000000011, which is the number that Visual C++
..NET is outputting. So that's where it comes from.

I don't know enough about floating point math to say whether the .NET
compiler's behavior (doing floating point math among constants as your
program would) or the older compiler's behavior (doing floating point
math among constants exactly) is more desireable. Hopefully someone
else will comment?

For the skinny on floating point representations, check out
<http://research.microsoft.com/~hollasch/cgindex/coding/ieeefloat.html>

-Peter

--
Pull out a splinter to reply.

Posted by osmium on June 30th, 2004


Bruno Christensen writes:

The answer you don't like is almost certainly the more accurate numbers
considering the vagaries of floating point numbers. For several years
compilers have commonly done something I would call "integerizing", if
something to be output is close to an integer, they contemplate their navel
and produce/don't produce an integer in the output.

I would browse around in the IDE, manuals, and help screens for an
integerize mode someplace in the .NET product. Note that this process is
*not* rounding, it is producing a wrong result on purpose to satisfy a
normal human desire. There may be a formal word for this process,
integerize is just something I invented for this discussion.





Posted by Randy on June 30th, 2004


Both the GNU and Intel compilers do not have this problem on my Linux box.

Bruno Christensen wrote:
Both the GNU compilers (2.96 and 3.3.2) and Intel (8.0) compilers do not have
this problem on my Linux box, on either Pentium or Itanium2.

Four suggestions:

1) Define the constants to be long doubles:

double x = 0.75L * 0.1L;
double z = 7.5L * 0.01L;

If you explicitly define each constant to be at least a double, then there's no
possibility that the compiler could do something stupid like this:

(double) x = (double) ((float)0.75 * (float)0.1);

Of course, ANSI C indicates that floating point constants should be doubles, but
it's possible that other compilers use only 64 bits, while others support all 80
bits that are present in the IEEE compliant FPU.

Maybe M$ C has a 'strict ANSI' compiler flag that enforces this?

2) There may also be compiler switches that preserve double precision accuracy
in the face of high levels of optimization. You didn't say whether you were
using any optimization.

3) I assume you also used the %lf double precision format to print, (and not the
%f single precision format), as in:

printf( "x = %20.15lf, y = %20.15lf\n", x, y);

4) Buy a better compiler.

Randy

--
Randy Crawford http://www.ruf.rice.edu/~rand rand AT rice DOT edu

Posted by Thomas G. Marshall on July 1st, 2004


Bruno Christensen <bc@neplan.ch> coughed up the following:

I remember once in the early /early/ days of java I came accross the
following odd behavior:

1.040 > 1.04

I had to code circles around it using integerization (not the same as what
osmium posted). The only solution around that particular mess that I bumped
into was to compute things * 1000000 or so, and then use integer-/ and % to
divide the results back down by 1000000. Pain in the ass.



Posted by Bruno Christensen on July 2nd, 2004


Thanks a lot for your comments. My hope was that some compiler option
could solve the problem. But the problem seems to be some basic
floating point phenomena.

Thanks
Bruno

Posted by K. Henriksen on July 5th, 2004


Have a look at footnote #18 in "Thinking in Java 3rd edition" (GYIF).


Similar Posts