Calculating a Sum in C: Finding the Error in Floating Point Precision?

mathmari · Oct 27, 2013

Hi!
I want to write a program in c, that calculates the sum [1+sum{1/(i(i+1)) from 1 to n}].
I declared the variables as float.
When I run the program with n=9, the output is 1.899999976, but when I calculate this with a calculator the result is 1.9.
Where is the error?

I like Serena · Oct 27, 2013

mathmari said:

Hi!
I want to write a program in c, that calculates the sum [1+sum{1/(i(i+1)) from 1 to n}].
I declared the variables as float.
When I run the program with n=9, the output is 1.899999976, but when I calculate this with a calculator the result is 1.9.
Where is the error?

Hey mathmari!

Your calculator is more accurate than C when working with floats.
A "float" has only 7 significant digits, meaning that starting from the 8th digit the result will not be accurate anymore.

Tip: don't use "float". It is often not accurate enough.
Use "double" instead. It has 16 significant digits.

mathmari · Oct 27, 2013

I have to do it once with "double" and once with "float". I found the error.. I had a mistake at printf..
Could you tell me the result for "double" and "float" for n=99999, so that I can check my results?

I like Serena · Oct 27, 2013

mathmari said:

I have to do it once with "double" and once with "float". I found the error.. I had a mistake at printf..
Could you tell me the result for "double" and "float" for n=99999, so that I can check my results?

EDIT: The following approximations are incorrect. See later posts.

In floats I get 1.999787 or with more digits 1.999786615.
In doubles I get 1.999965 or with more digits 1.999965439.

Mathematically, it is
\begin{aligned}
1+\sum_{i=1}^{n} \frac 1 {i(i+1)}
&= 1 + \sum_{i=1}^{n} \left( \frac 1 i - \frac 1 {i+1} \right) \\
&= 1 + \left(1 - \frac 1{n+1}\right) \\
&= 1 + \left(1 - \frac 1{99999 + 1}\right) \\
&= 1.99999
\end{aligned}

I am surprised myself, since I wouldn't have expected the double calculations to be so bad.
It is an example where rounding errors are accumulating significantly though.

mathmari · Oct 27, 2013

I got the same answers! Thank you very much! :)

I like Serena · Oct 27, 2013

mathmari said:

I got the same answers! Thank you very much!

Hold on!

Since it bugged me that the answer was just a little bit too far off for doubles, I rechecked the calculations.
The resulting rounding error for doubles should be less than $10^{-14}$ and it wasn't.

Now I get:

Code:

float:  1.999791980
double: 1.999990000

The problem is that $i(i+1)$ has an overflow for high $i$.
An "int" can typically only hold a maximum of about $2 \times 10^9$ and we're going over.
To do it properly, the multiplication $i \times (i+1)$ has to be executed in floating point types.

mathmari · Oct 27, 2013

Could you explain it further to me, why I have to declare all the variables as "float"??

I like Serena · Oct 27, 2013

mathmari said:

Could you explain it further to me, why I have to declare all the variables as "float"??

You don't necessarily have to declare them as float, but you do have to make sure the multiplication is a float multiplication.

In your for-loop you probably have something like this:

Code:

float result = 1;
int i;
for (i = 1; i <= n; ++i)
{
    result += 1 / (float)(i * (i + 1));
}

Like this, i is multiplied by (i+1) as integers.
When i = 99999, the result is 9999900000.
This does not fit into an integer, and it becomes 1409965408 instead.

What you need is this:

Code:

float result = 1;
int i;
for (i = 1; i <= n; ++i)
{
    result += 1 / ((float)i * (i + 1));
}

This way i is converted to a float before the multiplication.
And the result 9999900000 does fit into a float, although it will already get a rounding error.

mathmari · Oct 27, 2013

Is the code

Code:

float result = 1;
int i;
for (i = 1; i <= n; ++i)
{
    result += 1 / ([COLOR="#FF0000"](float)[/COLOR]i * (i + 1));
}

equal to:

Code:

float result = 1;
int i;
for (i = 1; i <= n; ++i)
{
    result += [COLOR="#FF0000"]1.0[/COLOR] / (i * (i + 1));
}

?Also, when you wrote the output of the program, the numbers had a precision of 9 digits. Then I tried to print my results also with 9 digits by using %.9f, but for n=9 the result for float is 1.899999976 instead of 1.9. Why does this happen??

I like Serena · Oct 27, 2013

mathmari said:

Is the code

Code:

float result = 1;
int i;
for (i = 1; i <= n; ++i)
{
    result += 1 / ([COLOR="#FF0000"](float)[/COLOR]i * (i + 1));
}

equal to:

Code:

float result = 1;
int i;
for (i = 1; i <= n; ++i)
{
    result += [COLOR="#FF0000"]1.0[/COLOR] / (i * (i + 1));
}

?

No.
If k is an integer, then "1 / (float)k" is equivalent to "1.0 / k".

But the essential difference is in "i * (i+i)".
The value of "(float)(i *(i+1))" is different from "((float)i * (float)(i+1))".

Also, when you wrote the output of the program, the numbers had a precision of 9 digits. Then I tried to print my results also with 9 digits by using %.9f, but for n=9 the result for float is 1.899999976 instead of 1.9. Why does this happen??

"%.9f" is fine. It will print 9 digits after the decimal point.
However, with a float you have rounding errors that become visible.
If you use a double, there won't be rounding errors (at least not in the first 9 digits).

mathmari · Oct 27, 2013

Nice, I got it! Thank you very much!

I have also an other question

When I calculate the sum backwards, [tex] \frac{1}{n(n+1)}+\frac{1}{n(n-1)}+\frac{1}{(n-2)(n-1)}+...+1 [/tex], why do I get a better approximation?

I like Serena · Oct 27, 2013

mathmari said:

Nice, I got it! Thank you very much!

I have also an other question

When I calculate the sum backwards, [tex] \frac{1}{n(n+1)}+\frac{1}{n(n-1)}+\frac{1}{(n-2)(n-1)}+...+1 [/tex], why do I get a better approximation?

Suppose you add 0.00012345 to 1.
Then you get 1.00012345.
But due to the limited precision of a float it gets truncated to something like 1.000123.
That is because it has about 7 significant digits.
Repeatedly adding for instance 0.0000001 to it won't have any effect.

On the other hand, if you add a small value to 0.00012345, it won't get "lost", since we're still working with the limited precision of the float. Repeatedly adding 0.0000001 will have the expected effect.

mathmari · Oct 31, 2013

Could you generalize it please,so that I understand what happens in general case??

I like Serena · Oct 31, 2013

mathmari said:

Could you generalize it please,so that I understand what happens in general case??

Not sure what you mean.
Perhaps you can indicate what you're looking for?

I can say for instance the following.

Suppose we pick a number x that is so small that 1 + x comes out the same as 1.
In the case of a float $x = 10^{-8}$ will do.
That is, we have $1 + x = 1$.

And suppose we add x to 1 repeatedly, say n times.
Then we have that $(((1+x)+x)+...+x)+x = 1$.
But $(((x+x)+x)+...+x)+1=nx+1$.
The latter is different from 1 if n is large enough.

mathmari · Nov 1, 2013

If I write the command printf("%.16lf",x) does the number x=10^(-8) will do,can we pick this number at this case or should I pick a number 10^(- a number greater than 16) ??

I like Serena · Nov 1, 2013

mathmari said:

If I write the command printf("%.16lf",x) does the number x=10^(-8) will do,can we pick this number at this case or should I pick a number 10^(- a number greater than 16) ??

That depends on how x has been declared.
If you have "float x;" before this, then "x=1e-8;" will do.
If you have "double x;" before this, then you will need something like "x=1e-17;".

And actually, you can also do:

Code:

#include <float.h>
...
float x_float = FLT_EPSILON;
double x_double = DBL_EPSILON;
...

This EPSILON is defined as: "Difference between 1 and the least value greater than 1 that is representable."

mathmari · Nov 1, 2013

A ok..I got it :) Now I am looking at some other exercises with forward/backward error analysis and I noticed that calculating the sum [tex] Σ_{k=1}^{n}(\frac{1}{k^2}) [/tex],using floats,for n<10000 the forward method is better,and for n>=10000,the backward method is better..Why does this happen?

I like Serena · Nov 1, 2013

mathmari said:

A ok..I got it :) Now I am looking at some other exercises with forward/backward error analysis and I noticed that calculating the sum [tex] Σ_{k=1}^{n}(\frac{1}{k^2}) [/tex],using floats,for n<10000 the forward method is better,and for n>=10000,the backward method is better..Why does this happen?

How so?
Backward method should always be better.

I quickly checked and my results confirm that.

mathmari · Nov 1, 2013

For the forward method for floats I used the following algorithm:

Code:

      int i;
      float m=0;
      for (i=1; i<=n; i++)
           m=m+1.0/(pow(i,2));

For the backward method:

Code:

      int i;
      float l=1.0/(pow(n,2));
      for (i=1; i<n; i++) 
           l=l+1.0/(pow(n-i,2));

What have I done wrong?

I like Serena · Nov 1, 2013

mathmari said:
For the forward method for floats I used the following algorithm:
Code:
      int i;
      float m=0;
      for (i=1; i<=n; i++)
           m=m+1.0/(pow(i,2));
For the backward method:
Code:
      int i;
      float l=1.0/(pow(n,2));
      for (i=1; i<n; i++) 
           l=l+1.0/(pow(n-i,2));
What have I done wrong?

No mistake.
And your results check out.

Note that the backward results are slightly higher than the forward results as they should be.
The backward results are more accurate.

mathmari · Nov 1, 2013

For n=1000,the forward method error is 0.0009543306455124,and the backward method error 0.0009546722587621
Have you found something else for this n?

I like Serena · Nov 1, 2013

mathmari said:

For n=1000,the forward method error is 0.0009543306455124,and the backward method error 0.0009546722587621
Have you found something else for this n?

For n=1000 with your code, I get:

Code:

Forward  result =  1.6439348459243774
Error           = -0.0000002792428178
Backward result =  1.6439344882965088
Error           =  0.0000000783850509

Did you perhaps calculate the difference with $\displaystyle \frac{\pi^2} 6$ or something like that?
What did you calculate?

Either way, as you can see from my results, the error with the backward method is way less than the forward method.

mathmari · Nov 3, 2013

Which algorithm did you use to find it?shouldn't the result be near to the real value of pi?

I like Serena · Nov 3, 2013

mathmari said:

Which algorithm did you use to find it?shouldn't the result be near to the real value of pi?

I did the same calculation with "double" instead of "float".
With "double" it is so much more accurate that we can use the difference to find the error.

The series up to n=1000 is not $\frac{\pi^2}{6}$ yet.
It only approaches it.
With higher n, we'll get closer.
The forward algorithm will never get there, but the backward algorithm does.

mathmari · Nov 3, 2013

Isn't your code calculating the approximation of [tex] \frac{\pi^2}{6}[/tex].My program should give the approximation of [tex] \pi [/tex] ,so I multiplied my result with 6 and squared it.Or am I wrong?

I like Serena · Nov 3, 2013

mathmari said:

Isn't your code calculating the approximation of [tex] \frac{\pi^2}{6}[/tex].My program should give the approximation of [tex] \pi [/tex] ,so I multiplied my result with 6 and squared it.Or am I wrong?

Yes. We get an approximation of [tex] \frac{\pi^2}{6}[/tex].
With n=1000, we get an approximation that is below this number.

So you would indeed multiply the result by 6.
But then you should not square it, but take the square root.

mathmari · Nov 3, 2013

Even if I calculate the approximation of [tex] \frac{\pi^2}{6}[/tex],my results for n=1000 are
for float:
forward: 1.6439348459243774
backward:1.6439344882965088

for double:
forward: 1.6439345666815612
backward:1.6439345666815597

Why are my results different as yours?? :(

I like Serena · Nov 3, 2013

mathmari said:

Even if I calculate the approximation of [tex] \frac{\pi^2}{6}[/tex],my results for n=1000 are
for float:
forward: 1.6439348459243774
backward:1.6439344882965088

for double:
forward: 1.6439345666815612
backward:1.6439345666815597

Why are my results different as yours?? :(

Let me put your results for n=1000 and mine below each other:

Code:

for float:

forward:  1.6439348459243774      
ILSe:     1.6439348459243774

backward: 1.6439344882965088       
ILSe:     1.6439344882965088

Where do you see a different result?

mathmari · Nov 3, 2013

Oh,sorry!

Now,I found the error of each method for n=1000.float:
forward method: 1.6439348459243774 with error 0.0009992209238490
backward method: 1.6439344882965088 with error 0.0009995785517176

The error of the backward method is bigger than the forward method.Why??
Or am I wrong??

I like Serena · Nov 3, 2013

mathmari said:

Oh,sorry!

Now,I found the error of each method for n=1000.float:
forward method: 1.6439348459243774 with error 0.0009992209238490
backward method: 1.6439344882965088 with error 0.0009995785517176

The error of the backward method is bigger than the forward method.Why??
Or am I wrong??

How did you calculate the error?

mathmari · Nov 3, 2013

I found the abolute value of (pi^2)/6-the result(for forward method:1.6439348459243774,for backward method:1.6439344882965088).

I like Serena · Nov 3, 2013

mathmari said:

I found the abolute value of (pi^2)/6-the result(for forward method:1.6439348459243774,for backward method:1.6439344882965088).

Right.
So apparently the forward method gives a result that is higher in this case than the backward result.
However, the true result that we should have for n=1000 is best approximated by the double backward algorithm, which you already gave.

Code:

float forward   = 1.6439348459243774
float backward  = 1.6439344882965088
double backward = 1.6439345666815597

As you can see the "float backward" result is closer to the double result than the "float forward" result.
However, we do see that the "float forward" result is higher than the "float backward" result.
This would happen due to rounding errors that work out differently.

mathmari · Nov 4, 2013

So,isn't the error of the backward method bigger for n<=1000??

I like Serena · Nov 4, 2013

mathmari said:

So,isn't the error of the backward method bigger for n<=1000??

The difference with $\frac{\pi^2}6$ of the backward method is bigger in this case (presumably due to "bad" rounding in the forward method that turns out to be "lucky").

However, the error in the backward result is smaller, since the result is not supposed to be $\frac{\pi^2}6$. It needs many more iterations before it gets there.

mathmari · Nov 4, 2013

If you multiply your result with 6 and find the square root and then the error,how could we explain why the forward method gives better results for smaller n??

Calculating a Sum in C: Finding the Error in Floating Point Precision?

Similar threads

Hot Threads

Recent Insights