为什么Apache建议不要在Linux上的NFS中使用sendfile()

Why does Apache recommend against using sendfile() with NFS on Linux

Apache文档包含有关EnableSendfile的以下语句:

With a network-mounted DocumentRoot (e.g., NFS, SMB, CIFS, FUSE), the kernel may be unable to serve the network file through its own cache.[1]

Apache 2.4和Nginx的默认配置禁用sendfile()。

我试图找到一些具体的东西来描述在Linux上将sendfile()与NFS文件系统一起使用时的确切问题。在内核3.10.0-327.36.3(CentOS 7)上运行一个最小的测试程序,可以验证当源位于NFS上时sendfile()确实可以工作,并且可以从页面缓存中读取(第一次运行很慢,随后很快, drop_caches使它再次变慢,即从源重新读取)。我尝试使用最大1G的文件,并且一切正常。我假设一定有一些情况可以揭示越野车的行为,但是我想确切地知道那是什么。

为了进行比较,那里有一些文档介绍了VirtualBox卷使用sendfile()[2]时遇到的问题,但是我找不到涵盖Apache的类似内容或如何复制有问题的配置。

  • [1] https://httpd.apache.org/docs/2.4/mod/core.html#enablesendfile
  • [2] https://www.virtualbox.org/ticket/12597

Nginx的默认配置打开sendfile-https://github.com/nginx/nginx/blob/release-1.13.8/conf/nginx.conf#L27,所以我对那里的声明感到困惑。

追溯到2000年代初,您可以看到Apache开发人员介绍了禁用SendFile的选项(这是该补丁程序的邮件列表)。在Apache错误跟踪器中,还有一些旧错误可能与sendfile相关。从Apache错误#12893中,我们了解到出现的故障之一是因为Linux内核NTFS实现根本不支持sendfile syscall:

[...] apparently there is some characteristic of your NTFS filesystem that
prevents sendfile() from working.

1
sendfile(8, 9, [0], 9804)               = -1 EINVAL (Invalid argument)

一篇名为" Sendfile和Apache的神秘案例"的博客文章引用了您正在阅读的stackoverflow问题,提出了以下理论:

sendfile() will transfer at most 0x7ffff000 (2,147,479,552) bytes, returning the number of bytes actually transferred. (This is true on both 32-bit and 64-bit systems.)

有2GB的限制。现在是假设,apache文档说:

With a network-mounted DocumentRoot (e.g., NFS, SMB, CIFS, FUSE), the kernel may be unable to serve the network file through its own cache[2]

因此,当它说"内核可能无法为文件提供服务"时,我想我们可能在这里指的是sendfile具有的文件大小固有限制。

有趣的理论,但是我怀疑这是答案,因为您可以选择不对太大的文件使用sendfile代码路径。更新:在四处寻找时,我发现该帖子的作者创建了一个名为"我错了关于Sendfile()和Apache的时间"的后续文章,其中提到了您正在阅读的答案!

ProFTPD文档中也有关于sendfile问题的警告:

There have been cases where it was the filesystems, rather than the kernels, which appeared to have been the culprits in sendfile(2) problems:

  • Network filesystems (e.g NFS, SMBFS/Samba, CIFS)
  • Virtualized filesystems (OpenVZ, VMware, and even Veritas)
  • Other filesystems (e.g. NTFS and tmpfs on Linux)

Again, if you encounter issues with downloading files from ProFTPD when those files reside on a networked or virtualized filesystem, try using"UseSendfile off" in your proftpd.conf.

很多"这里是龙"的警告。其中一些原因是因为文件系统根本不支持sendfile(例如直到2.4.22-pre3 Linux的tmpfs不支持sendfile)。基于FUSE的文件系统(例如NTFS-3g)在过去也会由于FUSE和sendfile错误(自被淘汰)而出现问题。尽管虚拟化文件系统的列表是一个有趣的补充...

但是,OrangeFS FAQ似乎有最合理的解释:

5.16 Can we run the Apache webserver to serve files off a orangefs volume?

Sure you can! However, we recommend that you turn off the EnableSendfile option in httpd.conf before starting the web server. Alternatively, you could configure orangefs with the option -enable-kernel-sendfile. Passing this option to configure results in a orangefs kernel module that supports the sendfile callback. But we recommend that unless the files that are being served are large enough this may not be a good idea in terms of performance. Apache 2.x+ uses the sendfile system call that normally stages the file-data through the page-cache. On recent 2.6 kernels, this can be averted by providing a sendfile callback routine at the file-system. Consequently, this ensures that we don't end up with stale or inconsistent cached data on such kernels. However, on older 2.4 kernels the sendfile system call streams the data through the page-cache and thus there is a real possibility of the data being served stale. Therefore users of the sendfile system call are warned to be wary of this detail.

可以在Linux guest虚拟机readv系统调用中读取类似的解释,该调用返回陈旧(缓存的)共享文件夹文件数据Virtualbox错误:

I have discovered that programs that read files using the read system call return the correct data, but those using the readv system call (such as my version of gas) read stale cached data.

[...]

the use of kernel function generic_file_read_iter as the .read_iter member of the file_operations structure (.read_iter is used when doing a readv system call). This function WILL write to and read from the file cache. However, vbox function sf_reg_read, as used for the generic .read member and read system call, appears to always bypass Linux's FS cache.

[...]

Further I believe that a similar long-lived issue is reported as ticket #819, only for the sendfile system call. It seems that all of these generic_file_* functions have the expectation that the host controls all access to the drive.

上面也可以解释ProFTPD的问题虚拟化文件系统列表。

摘要(最佳猜测)

Apache建议不要将sendfile()与Linux NFS一起使用,因为它们的软件很流行,并且触发了许多调试旧Linux NFS客户端与sendfile相关的错误的痛苦。该警告是陈旧的,将其保持原状可能比通过所有警告进行更新要容易得多。

如果您有一个Linux文件系统,可以在不更改Linux页面缓存的情况下更改基础数据,则在旧Linux内核上使用sendfile是不明智的(这解释了旧Linux NFS客户端问题)。对于较新的内核,如果上述文件系统仍未实现其自己的sendfile挂钩,则再次使用sendfile是不明智的(Virtualbox共享文件夹问题证明了这一点)。

最新的(2.6.31及更高版本)Linux内核为可能会遇到此失效问题的文件系统提供了使用其自己的sendfile实现的便利,并且假设该文件系统确实可以与sendfile一起使用,则可以避免错误,但需要警告!


我认为我发现Nginx可能会引起一些混乱。

Nginx文档在https://docs.nginx.com/nginx/admin-guide/web-server/serving-static-content/中指出以下内容:

"默认情况下,NGINX会自行处理文件传输并在发送之前将文件复制到缓冲区中。启用sendfile指令消除了将数据复制到缓冲区中的步骤,并使数据可以从一个文件描述符直接复制到另一个文件描述符。"

这听起来好像Nginx默认不使用sendfile。

但是,如本答案所述,Nginx的默认配置文件显式打开了对现成的HTTP服务器的sendfile支持,如https://github.com/nginx/nginx/blob/master/conf所示。 /nginx.conf。