Why Are There Floating Point Errors When Parsing Data in Pandas?

wle · Feb 12, 2021

Couple of things:

1) While Python (or rather 64-bit IEEE floats) can represent neither the numbers 25.15 nor 25.149999999999977 perfectly precisely, it can represent them both closely enough to distinguish them:

Python:

>>> x = 25.15
>>> y = 25.149999999999977
>>> x == y
False
>>> x - y
2.1316282072803006e-14

The closest floating-point approximations of these numbers are not the same.

2) Whatever that :n format directive is doing, it is clearly not trying to represent floating-point numbers to anywhere near machine precision:

Python:

>>> '{:n}'.format(1.23456789)
'1.23457'

Checking the documentation seems to confirm this:

'n' Number. This is the same as 'g', except that it uses the current locale setting to insert the appropriate number separator characters.

and, just above:

'g' [...] With no precision given, uses a precision of 6 significant digits for float. [...]

Zap · Feb 12, 2021

I used the :n formatting on the data set, and it appeared to totally fix the problem I was having. I don't know precisely what it is doing, but since it gave me the desired result, I went ahead and used it.

Mark44 said:

The value that is displayed is a string of numerical digit characters together with a decimal point.
The number 25.149999999999977 does not exist in memory. Instead, there is the binary representation, most likely in 64 bits. This 64-bit value is rounded or truncated to the desired precision and converted to a string of characters by print() or format(), but the underlying number is not changed.

The Python code I showed a while back shows a sequence of displayed values for the variable x, with varying numbers of decimal digits. At no time is the underlying value of x changed due to rounding or truncation. The only things that change are the (string) representations of this number according to how it's being formatted.

I thought that's what I was saying. I think the term for this is round tripping.

rbelli1 · Feb 12, 2021

Zap said:

You can't calculate 25.15 in binary.

This is untrue.

It is true that 25.15 cannot be represented exactly in a IEEE754 value.

If it was absolutely forbidden to calculate 25.15 in any way in binary then the Python examples in this thread would not have displayed 25.15 in them. However, there is an algorithm in python that knows how to quack¹ the value from 25.14999999999999857891452847979962825775146484375 in a double precision type to 25.15 in textual format when you ask for two decimal places.

Zap said:

However, as stated earlier, I don't want to use the round() function, because it's impossible to tell the computer in a simple way at which decimal place to round

https://docs.python.org/3/library/functions.html#round
round(number[, ndigits])

This rounds towards even rather then up for the halfway values so may not give the desired results. EDIT: This is the behavior of formatting and printing of text.

Zap said:

or which numbers have floating point errors and which do not, and I don't actually want to round numbers.

You can leave the values in the database with full precision and only alter the displayed information.

BoB

¹ If it walks like a rounding...

Zap · Feb 12, 2021

rbelli1 said:

This is untrue.

It is true that 25.15 cannot be represented exactly in a IEEE754 value.

If it was absolutely forbidden to calculate 25.15 in any way in binary then the Python examples in this thread would not have displayed 25.15 in them. However, there is an algorithm in python that knows how to quack¹ the value from 25.14999999999999857891452847979962825775146484375 in a double precision type to 25.15 in textual format when you ask for two decimal places.

I'm so confused lol. Is this not what I have been saying the entire time?

Mark44 · Feb 12, 2021

Zap said:

I'm so confused lol. Is this not what I have been saying the entire time?

Not from my perspective. It seems to me you have consistently misinterpreted or misunderstood the difference between what is displayed versus the underlying number itself. At least that's how I see it. If that's not it, why have we spent 40 posts on this question?

Zap · Feb 12, 2021

I repeatedly said that I wanted the exact representation of the floating point number in Python, and repeatedly said that by exact representation I mean that the floating point number will be displayed as exact even though the actual number was a binary approximation. I think I've said that about 6 times by now, but everyone continued to argue with me. I have just been repeating this over and over again. I cannot comprehend what the argument is. I seems as though you guys are messing with me, at this point.

I solved my problem in like post #5. Even though it's not a perfect solution, it's better than using round() or simply leaving it as is.

And by not a perfect solution, I mean that I don't totally understand how the solution is working, but it seems to be doing what I want it to do 100% of the time so far. The issue is parsing Excel data into Python, and the floating point precision and also the display of the floating point number is not being transferred correctly between programs, which I consider an error. Using the aforementioned method, this error is being fixed.

What is the argument here?

Zap · Feb 12, 2021

rbelli1 said:

If it was absolutely forbidden to calculate 25.15 in any way in binary then the Python examples in this thread would not have displayed 25.15 in them. However, there is an algorithm in python that knows how to quack¹ the value from 25.14999999999999857891452847979962825775146484375 in a double precision type to 25.15 in textual format when you ask for two decimal places.

Zap said:

Interestingly, if you do float( '{:n}'.format( x ) ), the result is 25.15, even though x = 25.149999999999977. Python has some built-in way of outputting floats exactly, even though behind the scenes the displayed value 25.15 may be stored as something like 25.1499999999999 ...

Maybe, you can point out in these two quotes where I was wrong. And by 25.1499999999999 ... , I don't mean an infinitely repeating number ... I mean some kind of approximation to some number of digits ... Almost very post I've made in this thread has just been repeating what I said above over and over.

jedishrfu · Feb 12, 2021

I would say its time to close this thread with the understanding that:

@Zap has found a workable solution to his problem,
that there is a clear rift of understanding the meaning of mathematical real numbers and how they are represented in IEEE floating point format in memory and how computers can coax them into more human readable formats.

Thanks to all who have contributed here.

I leave these videos on floating point number for future generations of readers of this thread:

and here's a discussion of using Pandas for currency arithmetic:

https://beepscore.com/website/2018/10/12/using-pandas-with-python-decimal.html

so those pennies won't be vacuumed by by Richard Prior (Superman III)

Why Are There Floating Point Errors When Parsing Data in Pandas?

Similar threads

Hot Threads

Recent Insights