关于python：浮点数和字符串转换的奇怪行为

Strange behaviour with floats and string conversion

我已经在python shell中键入了：

1 2	>>> 0.1*0.1 0.010000000000000002

我期望0.1*0.1不是0.01，因为我知道基10中的0.1在基2中是周期性的。

1 2	>>> len(str(0.1*0.1)) 4

我期望得到20个字符，因为我已经看到上面20个字符。为什么我得到4？

1 2	>>> str(0.1*0.1) '0.01'

好吧，这就解释了为什么我len给我4，但是为什么str返回'0.01'？

1 2	>>> repr(0.1*0.1) '0.010000000000000002'

为什么str是圆形的，而repr不是圆形的？(我读过这个答案，但我想知道他们是如何决定何时用str发一发子弹，何时不用)

1
2
3
4

>>> str(0.01) == str(0.0100000000001)
False
>>> str(0.01) == str(0.01000000000001)
True

所以这似乎是浮动精度的问题。我以为Python会使用IEEE754单精度浮点数。所以我是这样检查的：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22

#include <stdint.h>
#include <stdio.h> // printf

union myUnion {
uint32_t i; // unsigned integer 32-bit type (on every machine)
float f; // a type you want to play with
};

int main() {
union myUnion testVar;
testVar.f = 0.01000000000001f;
printf("%f
", testVar.f);

testVar.f = 0.01000000000000002f;
printf("%f
", testVar.f);

testVar.f = 0.01f*0.01f;
printf("%f
", testVar.f);
}

我得到：

1
2
3

0.010000
0.010000
0.000100

Python给了我：

1
2
3
4
5
6

>>> 0.01000000000001
0.010000000000009999
>>> 0.01000000000000002
0.010000000000000019
>>> 0.01*0.01
0.0001

为什么python会给我这些结果？

(我使用的是python 2.6.5。如果您知道Python版本的不同，我也会对它们感兴趣的。)

相关讨论

我可以证实你的行为

1
2
3
4
5
6
7

ActivePython 2.6.4.10 (ActiveState Software Inc.) based on
Python 2.6.4 (r264:75706, Jan 22 2010, 17:24:21) [MSC v.1500 64 bit (AMD64)] on win32
Type"help","copyright","credits" or"license" for more information.
>>> repr(0.1)
'0.10000000000000001'
>>> repr(0.01)
'0.01'

现在，文档声称在python中<2.7

the value of repr(1.1) was computed as format(1.1, '.17g')

这只是一个小小的简化。

请注意，这与字符串格式化代码有关——在内存中，所有Python浮点都被存储为C++双打，因此它们之间永远不会有区别。

另外，即使你知道有更好的浮点数，使用全长的浮点数也有点不愉快。事实上，在现代的Python中，一种新的算法被用于浮动格式，它以一种聪明的方式选择最短的表示。

我花了一段时间在源代码中查找这个，所以我将在这里包含详细信息，以防您感兴趣。你可以跳过这一部分。

在floatobject.c中，我们看到

1
2
3
4
5
6
7
8

static PyObject *
float_repr(PyFloatObject *v)
{
char buf[100];
format_float(buf, sizeof(buf), v, PREC_REPR);

return PyString_FromString(buf);
}

这让我们看到了format_float。省略NAN/INF特殊情况，它是：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33

format_float(char *buf, size_t buflen, PyFloatObject *v, int precision)
{
register char *cp;
char format[32];
int i;

/* Subroutine for float_repr and float_print.
We want float numbers to be recognizable as such,
i.e., they should contain a decimal point or an exponent.
However, %g may print the number as an integer;
in such cases, we append".0" to the string. */

assert(PyFloat_Check(v));
PyOS_snprintf(format, 32,"%%.%ig", precision);
PyOS_ascii_formatd(buf, buflen, format, v->ob_fval);
cp = buf;
if (*cp == '-')
cp++;
for (; *cp != '\0'; cp++) {
/* Any non-digit means it's not an integer;
this takes care of NAN and INF as well. */
if (!isdigit(Py_CHARMASK(*cp)))
break;
}
if (*cp == '\0') {
*cp++ = '.';
*cp++ = '0';
*cp++ = '\0';
return;
}

<some NaN/inf stuff>
}

我们可以看到

因此，这首先初始化一些变量，并检查v是否是一个格式良好的浮点。然后它准备一个格式字符串：

1	PyOS_snprintf(format, 32,"%%.%ig", precision);

现在，prec-repr在floatobject.c的其他地方被定义为17，所以这计算到"%.17g"。现在我们打电话

1	PyOS_ascii_formatd(buf, buflen, format, v->ob_fval);

在隧道的尽头，我们看到了PyOS_ascii_formatd，发现它内部使用了snprintf。