多么懒惰的评价迫使Haskell变得纯粹

How lazy evaluation forced Haskell to be pure

我记得在一次演讲中,SPJ说,懒惰的评估迫使他们保持haskell的纯净(或是类似的东西)。我经常看到许多哈斯凯勒也这么说。

所以,我想知道,与严格的评估策略相比,懒惰的评估策略如何迫使他们保持haskell的纯净?


懒惰的评价并没有导致纯洁,哈斯克尔一开始就很纯洁。相反,懒惰的评估迫使语言的设计者保持语言的纯净。

以下是哈斯凯尔的历史:懒惰上课:

Once we were committed to a lazy language, a pure one was inescapable. The converse is not true, but it is notable that in practice most pure programming languages are also lazy. Why? Because in a call-by-value language, whether functional or not, the temptation to allow unrestricted side effects inside a"function" is almost irresistible.

Purity is a big bet, with pervasive consequences. Unrestricted side effects are undoubtedly very convenient. Lacking side effects, Haskell’s input/output was initially painfully clumsy, which was a source of considerable embarrassment. Necessity being the mother of invention, this embarrassment ultimately led to the invention of monadic I/O, which we now regard as one of Haskell’s main con- tributions to the world, as we discuss in more detail in Section 7.

Whether a pure language (with monadic effects) is ultimately the best way to write programs is still an open question, but it certainly is a radical and elegant attack on the challenge of programming, and it was that combination of power and beauty that motivated the designers. In retrospect, therefore, perhaps the biggest single benefit of laziness is not laziness per se, but rather that laziness kept us pure, and thereby motivated a great deal of productive work on monads and encapsulated state.

(我的重点)

我还邀请你听18’30’’的软件工程广播播客108,了解他自己的解释。以下是SPJ在彼得·塞贝尔的《工作中的编码员》中采访的一段较长但相关的段落:

I now think the important thing about laziness is that it kept us pure. [...]

[...] if you have a lazy evaluator, it’s harder to predict exactly when an expression is going to be evaluated. So that means if you want to print something on the screen, every call-by-value language, where the order of evaluation is completely explicit, does that by having an impure"function"—I’m putting quotes around it because it now isn’t a function at all—with type something like string to unit. You call this function and as a side effect it puts something on the screen. That’s what happens in Lisp; it also happens in ML. It happens in essentially every call-by-value language.

Now in a pure language, if you have a function from string to unit you would never need to call it because you know that it just gives the answer unit. That’s all a function can do, is give you the answer. And you know what the answer is. But of course if it has side effects, it’s very important that you do call it. In a lazy language the trouble is if you say,"f applied to print"hello"," then whether f evaluates its first argument is not apparent to the caller of the function. It’s something to do with the innards of the function. And if you pass it two arguments, f of print"hello" and print"goodbye", then you might print either or both in either order or neither. So somehow, with lazy evaluation, doing input/output by side effect just isn’t feasible. You can’t write sensible, reliable, predictable programs that way. So, we had to put up with that. It was a bit embarrassing really because you couldn’t really do any input/output to speak of. So for a long time we essentially had programs which could just take a string to a string. That was what the whole program did. The input string was the input and result string was the output and that’s all the program could really ever do.

You could get a bit clever by making the output string encode some output commands that were interpreted by some outer interpreter. So the output string might say,"Print this on the screen; put that on the disk." An interpreter could actually do that. So you imagine the functional program is all nice and pure and there’s sort of this evil interpreter that interprets a string of commands. But then, of course, if you read a file, how do you get the input back into the program? Well, that’s not a problem, because you can output a string of commands that are interpreted by the evil interpreter and using lazy evaluation, it can dump the results back into the input of the program. So the program now takes a stream of responses to a stream of requests. The stream of requests go to the evil interpreter that does the things to the world. Each request generates a response that’s then fed back to the input. And because evaluation is lazy, the program has emitted a response just in time for it to come round the loop and be consumed as an input. But it was a bit fragile because if you consumed your response a bit too eagerly, then you get some kind of deadlock. Because you’d be asking for the answer to a question you hadn’t yet spat out of your back end yet.

The point of this is laziness drove us into a corner in which we had to think of ways around this I/O problem. I think that that was extremely important. The single most important thing about laziness was it drove us there.

(我的重点)


我认为朱博斯的回答已经很好地总结了它(有很好的参考资料)。但是,用我自己的话说,我认为SPJ和朋友所指的是:

有时,不得不经历这种"单子"式的工作真的很不方便。关于堆栈溢出的大量问题询问"我该如何删除这个IO东西?"证明了这样一个事实:有时候,你真的想在这里打印出这个值,通常是为了弄清楚实际发生了什么事。

在一种渴望的语言中,仅仅增加一些魔法的不纯函数就非常诱人,这些函数可以让你直接做不纯的事情,就像在其他语言中一样。毫无疑问,一开始你会从一些小事情开始,但慢慢地,你滑下这个滑坡,在你意识到它之前,影响无处不在。

在哈斯克尔这种懒惰的语言中,这种诱惑仍然存在。有很多时候,能够快速地在这里或那里潜入一个小效果是非常有帮助的。除此之外,由于懒惰,添加效果几乎是完全无用的。你不能控制任何事情发生的时间。即使只是Debug.trace也会产生完全无法理解的结果。

简言之,如果你在设计一种懒惰的语言,你真的被迫想出一个连贯的故事来说明你如何处理效果。你不能只去"嗯,我们会假装这个功能只是魔法";如果没有更精确地控制效果的能力,你会立刻陷入一个可怕的混乱!

医生用一种热切的语言,你可以摆脱欺骗。在一种懒惰的语言中,你真的必须做正确的事情,否则它就是不起作用的。

这就是为什么我们雇佣了亚历克斯,等等,错误的窗口……


这取决于你在本文中对"纯"的定义。

  • 如果对于pure,你的意思和纯函数一样,那么@mathematicalorchid是正确的:对于懒惰的评估,你不知道在哪个序列中执行不纯的操作,因此你根本就无法编写有意义的程序,因此你不得不更加纯粹(使用IOmonad)。

    然而,我发现在这种情况下这并不真正令人满意。一种真正的功能语言已经把纯粹和不纯粹的代码分开了,所以即使是严格的代码也应该有某种IO

  • 但是,语句可能更广泛,引用的纯粹是这样一个事实,即您能够以更具声明性、复合性和高效的方式更容易地表达代码。

看看这个答案,你引用的这句话与休斯的《为什么函数式编程很重要》一文相链接,这篇文章很可能就是你所说的。

本文展示了高阶函数和延迟计算如何允许编写更多的模块化代码。请注意,它并没有提到纯粹的功能性等。它所说的重点是更模块化和更具声明性,而不会降低效率。

本文提供了一些例子。例如,牛顿-拉斐逊算法:在严格的语言中,必须将计算下一个近似值的代码和检查是否获得足够好的解的代码结合在一起。

使用Lazy Evaluation,您可以在函数中生成无限的近似值列表,并从返回第一个足够好的近似值的其他函数调用该列表。

在与哈斯凯尔·理查德·伯德的功能性思考中,正是这一点。如果我们看第2章,练习d:

Beaver is an eager evaluator, while Susan is a lazy one.

[...]

What alternative might Beaver prefer to head . filter p . map f?

答案是:

[...] Instead of defining first p f = head . filter p . map f,
Beaver might define

1
2
3
4
5
first :: (b -> Bool) -> (a -> b) -> [a] -> b
first p xs | null xs = error"Empty list"
           | p x = x
           | otherwise = first p f (tail xs)
           where x = f (head xs)

The point is that with eager evaluation most functions have to be defined using explicit recursion, not in terms of useful component
functions like map and filter.

这里的纯粹意味着允许声明性、复合性和高效的定义,而使用声明性和复合性定义的热情评估可能导致不必要的效率低下的代码。


严格来说,这种说法是不正确的,因为haskell有unsafePerformIO,这是语言功能纯度的一个大漏洞。(它利用了ghc-haskell函数纯度中的一个漏洞,最终返回到通过向语言添加一个严格的片段来实现未绑定算法的决策)。unsafePerformIO的存在是因为对于大多数程序员来说,"好吧,我只在内部使用副作用实现这一个函数"的诱惑是不可抗拒的。但是,如果你看看unsafePerformIO的缺点,你就会明白人们在说什么:

  • unsafePerformIO a不保证执行a
  • 如果执行,也不能保证只执行一次a
  • 对于a与程序其他部分执行的I/O的相对顺序也没有任何保证。

这些缺点使unsafePerformIO大多局限于最安全和最谨慎的使用,这也是人们直接使用IO的原因,直到它变得太不方便为止。

[1]除了类型不安全外,let r :: IORef a; r = unsafePerformIO $ newIORef undefined还提供了一个多态的r :: IORef a,可用于实现unsafeCast :: a -> b。ML有一个用于引用分配的解决方案,可以避免这种情况,如果纯度被认为是不可取的,haskell也可以用类似的方法来解决它(不管怎样,单态限制几乎是一个解决方案,你只需禁止人们使用类型签名,就像我上面所做的那样)。