为什么Ruby的Float#round行为与Python不同？

Why is Ruby's Float#round behavior different than Python's?

" Python中的rounda函数的行为"观察到Python的舍入是这样浮动的：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20

>>> round(0.45, 1)
0.5
>>> round(1.45, 1)
1.4
>>> round(2.45, 1)
2.5
>>> round(3.45, 1)
3.5
>>> round(4.45, 1)
4.5
>>> round(5.45, 1)
5.5
>>> round(6.45, 1)
6.5
>>> round(7.45, 1)
7.5
>>> round(8.45, 1)
8.4
>>> round(9.45, 1)
9.4

接受的答案确认这是由浮点数的二进制表示不正确引起的，这都是合乎逻辑的。

假设Ruby的浮动与Python一样不准确，那么Ruby的浮动如何像人类一样？ Ruby会作弊吗？

1
2
3
4
5
6
7
8
9
10
11
12
13

1.9.3p194 :009 > 0.upto(9) do |n|
1.9.3p194 :010 > puts (n+0.45).round(1)
1.9.3p194 :011?> end
0.5
1.5
2.5
3.5
4.5
5.5
6.5
7.5
8.5
9.5

相关讨论

摘要

两个实现都面临围绕二进制浮点数的相同问题。

Ruby通过简单的操作直接乘以浮点数(乘以10的幂，然后进行调整和截断)。

Python使用David Gay的复杂算法将二进制浮点数转换为字符串，该算法可产生与二进制浮点数完全相等的最短十进制表示形式。这不会进行任何四舍五入，它是对字符串的精确转换。

使用最短的字符串表示形式，Python使用精确的字符串操作将其舍入到小数位数。浮点数到字符串转换的目标是尝试"撤消"某些二进制浮点表示错误(即，如果输入6.6，Python会在6.6而不是6.5999999999999996上四舍五入。

此外，Ruby在四舍五入模式方面与某些版本的Python不同：从零开始舍入与从一半开始舍入。

详细信息

Ruby不会作弊。它以普通的旧二进制浮点数开头，与Python相同。因此，它面临一些相同的挑战(例如，将3.35表示为略高于3.35，将4.35表示为略小于4.35)：

1
2
3
4

>>> Decimal.from_float(3.35)
Decimal('3.350000000000000088817841970012523233890533447265625')
>>> Decimal.from_float(4.35)
Decimal('4.3499999999999996447286321199499070644378662109375')

查看实现差异的最佳方法是查看基础源代码：

这是Ruby源代码的链接：https://github.com/ruby/ruby/blob/trunk/numeric.c#L1587

Python源代码从此处开始：http://hg.python.org/cpython/file/37352a3ccd54/Python/bltinmodule.c
并在此处完成：http://hg.python.org/cpython/file/37352a3ccd54/Objects/floatobject.c#l1080

后者具有广泛的注释，揭示了两个实现之间的差异：

The basic idea is very simple: convert and round the double to a
decimal string using _Py_dg_dtoa, then convert that decimal string
back to a double with _Py_dg_strtod. There's one minor difficulty:
Python 2.x expects round to do round-half-away-from-zero, while
_Py_dg_dtoa does round-half-to-even. So we need some way to detect and correct the halfway cases.

Detection: a halfway value has the form k * 0.5 * 10**-ndigits for
some odd integer k. Or in other words, a rational number x is exactly
halfway between two multiples of 10**-ndigits if its 2-valuation is
exactly -ndigits-1 and its 5-valuation is at least
-ndigits. For ndigits >= 0 the latter condition is automatically satisfied for a binary float x, since any such float has nonnegative
5-valuation. For 0 > ndigits >= -22, x needs to be an integral
multiple of 5**-ndigits; we can check this using fmod. For -22 >
ndigits, there are no halfway cases: 5**23 takes 54 bits to represent
exactly, so any odd multiple of 0.5 * 10**n for n >= 23 takes at least
54 bits of precision to represent exactly.

Correction: a simple strategy for dealing with halfway cases is to
(for the halfway cases only) call _Py_dg_dtoa with an argument of
ndigits+1 instead of ndigits (thus doing an exact conversion to
decimal), round the resulting string manually, and then convert back
using _Py_dg_strtod.

简而言之，Python 2.7竭尽全力准确地遵循了从零舍入的规则。

在Python 3.3中，准确遵循舍入到均等的规则也花费了同样长的时间。

有关_Py_dg_dtoa函数的其他详细信息。 Python之所以将float转换为字符串函数是因为它实现了一种算法，该算法在相等的替代方案中给出了尽可能短的字符串表示形式。例如，在Python 2.6中，数字1.1表示为1.1000000000000001，但是在Python 2.7及更高版本中，数字仅为1.1。 David Gay复杂的dtoa.c算法提供了"人们期望的结果"，而没有任何准确性。

该字符串转换算法倾向于弥补困扰对二进制浮点数的round()的任何实现的一些问题(即较少的4.35舍入应从4.35开始而不是4.3499999999999996447286326321199499070644378662109375)。

那和舍入模式(从零舍入到一半-从零舍入)是Python和Ruby round()函数之间的本质区别。

相关讨论

基本区别是：

Python：转换为十进制然后四舍五入

Ruby：四舍五入然后转换为十进制

Ruby是从原始浮点数位字符串四舍五入的，但是用10n对其进行了运算之后。不仔细查看就无法看到原始的二进制值。这些值是不精确的，因为它们是二进制的，并且我们习惯于以十进制形式编写，并且由于这种情况发生，我们可能要编写的几乎所有十进制小数字符串都没有与以2为基数的分数字符串完全相同的结果。

尤其是0.45看起来像这样：

1	01111111101 1100110011001100110011001100110011001100110011001101

以十六进制表示，即3fdccccccccccccd.

它以二进制重复，第一个未表示的数字是0xc,，而巧妙的十进制输入转换已将最后一个小数位数精确地四舍五入为0xd。

这意味着在计算机内部，该值大约比0.45大1/250。这显然是一个非常非常小的数字，但足以使默认的最近舍入算法舍入而不是加到偶数的平局。

Python和Ruby都可能舍入不止一次，因为每个操作实际上都舍入到最低有效位。

我不确定我是否同意Ruby会做人类会做的事情。我认为Python近似于十进制算术。 Python(取决于版本)正在对十进制字符串应用舍入最近的值，而Ruby将对计算出的二进制值应用舍入最近的算法。

请注意，我们在这里可以很清楚地看到人们说FP不精确的原因。这是一个合理的陈述，但更确切地说，我们根本无法在二进制和大多数十进制小数之间进行准确的转换。 (有些做法是：0.25、0.5、0.75等)。大多数简单的十进制数字都是二进制重复数字，因此我们永远无法存储精确的等效值。但是，我们可以存储的每个值都是精确已知的，并且对它执行的所有算术也都精确执行。如果我们首先用二进制写分数，那么我们的FP算术将被认为是精确的。