关于python:当TCP服务器读取非阻塞套接字时,TCP客户端是否应该能够暂停服务器

Should a TCP client be able to pause the server, when the TCP server reads a non-blocking socket

总览

我有一个简单的问题,下面的代码。希望我没有在代码中犯错。

我是一名网络工程师,我需要在网络中断期间测试我们的业务应用程序keepalive的某些linux行为(我稍后将插入一些iptables内容以实现连接-首先,我想确保获得客户端和服务器权限)。

作为我正在进行的网络故障测试的一部分,我编写了一个非阻塞的Python TCP客户端和服务器,该客户端和服务器应该在循环中盲目地相互发送消息。为了了解正在发生的事情,我正在使用循环计数器。

服务器的循环应该相对简单。我遍历select说准备好的每个fd。我什至从未在服务器代码中的任何地方导入sleep。从这个角度来看,我不希望服务器代码在客户端套接字上循环时会暂停,但是由于某种原因,服务器代码会间歇性地暂停(下面有更多详细信息)。

最初,我没有在客户端的循环中入睡。在客户端没有睡眠的情况下,服务器和客户端似乎像我想要的那样高效。但是,当客户端在对服务器执行fd.send()之后放置time.sleep(1)语句时,TCP服务器代码间歇地在客户端睡眠时暂停。

我的问题:

  • 我应该能够编写一个单线程的Python TCP服务器,当客户端在客户端的fd.send()循环中按下time.sleep()时,该服务器不会暂停吗?如果是这样,我在做什么错了? <-回答
  • 如果我正确编写了此测试代码,并且服务器不应暂停,那么为什么TCP服务器在轮询客户端连接以获取数据时会间歇性地 暂停?

重现场景

我正在两台RHEL6 linux机器上运行它。重现问题...

  • 打开两个不同的终端。
  • 将客户端和服务器脚本保存在不同的文件中
  • 将shebang路径更改为本地python(我正在使用python 2.7.15)
  • 将客户端代码中的SERVER_HOSTNAMESERVER_DOMAIN更改为在其上运行服务器的服务器的主机名和域
  • 首先启动服务器,然后启动客户端。

客户端连接后,您将在服务器的终端中看到如图1所示的消息快速滚动。 几秒钟后当客户端点击time.sleep()时,滚动会间歇性地 暂停。我不希望看到这些停顿,但也许我误会了一些东西。

附件1

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
---
LOOP_COUNT 0
---
LOOP_COUNT 1
---
LOOP_COUNT 2
---
LOOP_COUNT 3
CLIENTMSG: 'client->server 0'
---
LOOP_COUNT 4
---
LOOP_COUNT 5
---
LOOP_COUNT 6
---
LOOP_COUNT 7
---
LOOP_COUNT 8
---
LOOP_COUNT 9
---
LOOP_COUNT 10
---
LOOP_COUNT 11
---

最终的非阻塞代码(在答案中包含建议):

tcp_server.py

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
#!/usr/bin/python -u
from socket import AF_INET, SOCK_STREAM, SO_REUSEADDR, SOL_SOCKET
from socket import MSG_DONTWAIT
#from socket import MSG_OOB  <--- for send()
from socket import socket
import socket as socket_module
import select
import errno
import fcntl
import time
import sys
import os

def get_errno_info(e, op='', debugmsg=False):
   """Return verbose information from errno errors, such as errors returned by python socket()"""
    VALID_OP = set(['accept', 'connect', 'send', 'recv', 'read', 'write'])
    assert op.lower() in VALID_OP,"op must be: {0}".format(
        ','.join(sorted(VALID_OP)))

    ## ref: man 3 errno (in linux)... other systems may be man 2 intro
    ##   also see https://docs.python.org/2/library/errno.html
    try:
        retval_int = int(e.args[0])         # Example: 32
        retval_str = os.strerror(e.args[0]) # Example: 'Broken pipe'
        retval_code = errno.errorcode.get(retval_int, 'MODULEFAIL') # Ex: EPIPE
    except:
        ## I don't expect to get here unless something broke in python errno...
        retval_int  = -1
        retval_str  = '__somethingswrong__'
        retval_code = 'BADFAIL'

    if debugmsg:
        print"DEBUG: Can't {0}() on socket (errno:{1}, code:{2} / {3})".format(
            op, retval_int, retval_code, retval_str)
    return retval_int, retval_str, retval_code


host = ''
port = 6667     # IRC service
DEBUG = True

serv_sock = socket(AF_INET, SOCK_STREAM)
serv_sock.setsockopt(SOL_SOCKET, SOCK_STREAM, 1)
serv_sock.bind((host, port))
serv_sock.listen(5)

#fcntl.fcntl(serv_sock, fcntl.F_SETFL, os.O_NONBLOCK)  # Make the socket non-blocking
serv_sock.setblocking(False)

sock_list = [serv_sock]

from_client_str = '__DEFAULT__'

to_client_idx = 0
loop_count = 0
need_send_select = False
while True:
    if need_send_select:
        # Only do this after send() EAGAIN or EWOULDBLOCK...
        send_sock_list = sock_list
    else:
        send_sock_list = []

    #print"---"
    #print"LOOP_COUNT",  loop_count

    recv_ready_list, send_ready_list, exception_ready = select.select(
        sock_list, send_sock_list, [], 0.0)  # Last float is the select() timeout...


    ## Read all sockets which are output-ready... might be client or server...
    for sock_fd in recv_ready_list:

        # accept() if we're reading on the server socket...
        if sock_fd is serv_sock:
            try:
                clientsock, clientaddr = sock_fd.accept()
            except socket_module.error, e:
                errstr, errint, errcode = get_errno_info(e, op='accept',
                    debugmsg=DEBUG)

            assert sock_fd.gettimeout()==0.0,"client socket should be in non-blocking mode"
            sock_list.append(clientsock)

        # read input from the client socket...
        else:
            try:
                from_client_str = sock_fd.recv(1024, MSG_DONTWAIT)
                if from_client_str=='':
                    # Client closed the socket...
                    print"CLIENT CLOSED SOCKET"
                    sock_list.remove(sock_fd)
            except socket_module.error, e:
                errstr, errint, errcode = get_errno_info(e, op='recv',
                    debugmsg=DEBUG)
                if errcode=='EAGAIN' or errcode=='EWOULDBLOCK':
                    # socket unavailable to read()
                    continue
                elif errcode=='ECONNRESET' or errcode=='EPIPE':
                    # Client closed the socket...
                    sock_list.remove(sock_fd)
                else:
                    print"UNHANDLED SOCKET ERROR", errcode, errint, errstr
                    sys.exit(1)


            print"from_client_str: '{0}'".format(from_client_str)

    ## Adding dynamic_list, per input from EJP, below...
    if need_send_select is False:
        dynamic_list = sock_list
    else:
        dynamic_list = send_ready_list
    ## NOTE:  socket code shouldn't walk this list unless a write is pending...
    ##      broadast the same message to all clients...
    for sock_fd in dynamic_list:

        ## Ignore server's listening socket...
        if sock_fd is serv_sock:
            ## Only send() to accept()ed sockets...
            continue

        try:

            to_client_str ="server->client: {0}
"
.format(to_client_idx)
            send_retval = sock_fd.send(to_client_str, MSG_DONTWAIT)
            ## send() returns the number of bytes written, on success
            ##     disabling assert check on sent bytes while using MSG_DONTWAIT
            #assert send_retval==len(to_client_str)

            to_client_idx += 1
            need_send_select = False
        except socket_module.error, e:
            errstr, errint, errcode = get_errno_info(e, op='send',
                debugmsg=DEBUG)
            if errcode=='EAGAIN' or errcode=='EWOULDBLOCK':
                need_send_select = True
                continue
            elif errcode=='ECONNRESET' or errcode=='EPIPE':
                # Client closed the socket...
                sock_list.remove(sock_fd)
            else:
                print"FATAL UNHANDLED SOCKET ERROR", errcode, errint, errstr
                sys.exit(1)

    loop_count += 1

tcp_client.py

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
#!/usr/bin/python -u
from socket import AF_INET, SOCK_STREAM
from socket import MSG_DONTWAIT    # non-blocking send/recv; see man 2 recv
from socket import gethostname, socket
import socket as socket_module
import select
import fcntl
import errno
import time
import sys
import os

## NOTE: Using this script to simulate a scheduler
SERVER_HOSTNAME = 'myServerHostname'
SERVER_DOMAIN = 'mydomain.local'
PORT = 6667
DEBUG = True

def get_errno_info(e, op='', debugmsg=False):
   """Return verbose information from errno errors, such as errors returned by python socket()"""
    VALID_OP = set(['accept', 'connect', 'send', 'recv', 'read', 'write'])
    assert op.lower() in VALID_OP,"op must be: {0}".format(
        ','.join(sorted(VALID_OP)))

    ## ref: man 3 errno (in linux)... other systems may be man 2 intro
    ##   also see https://docs.python.org/2/library/errno.html
    try:
        retval_int = int(e.args[0])         # Example: 32
        retval_str = os.strerror(e.args[0]) # Example: 'Broken pipe'
        retval_code = errno.errorcode.get(retval_int, 'MODULEFAIL') # Ex: EPIPE
    except:
        ## I don't expect to get here unless something broke in python errno...
        retval_int  = -1
        retval_str  = '__somethingswrong__'
        retval_code = 'BADFAIL'

    if debugmsg:
        print"DEBUG: Can't {0}() on socket (errno:{1}, code:{2} / {3})".format(
            op, retval_int, retval_code, retval_str)
    return retval_int, retval_str, retval_code


connect_finished = False
while not connect_finished:
    try:
        c2s = socket(AF_INET, SOCK_STREAM) # Client to server socket...
        # Set socket non-blocking
        #fcntl.fcntl(c2s, fcntl.F_SETFL, os.O_NONBLOCK)
        c2s.connect(('.'.join((SERVER_HOSTNAME, SERVER_DOMAIN,)), PORT))
        c2s.setblocking(False)
        assert c2s.gettimeout()==0.0,"c2s socket should be in non-blocking mode"
        connect_finished = True
    except socket_module.error, e:
        errstr, errint, errcode = get_errno_info(e, op='connect',
            debugmsg=DEBUG)
        if errcode=='EINPROGRESS':
            pass

to_srv_idx = 0
need_send_select = False
while True:
    socket_list = [c2s]

    # Get the list sockets which can: take input, output, etc...
    if need_send_select:
        # Only do this after send() EAGAIN or EWOULDBLOCK...
        send_sock_list = socket_list
    else:
        send_sock_list = []
    recv_ready_list, send_ready_list, exception_ready = select.select(
        socket_list, send_sock_list, [])

    for sock_fd in recv_ready_list:
        assert sock_fd is c2s,"Strange socket failure here"

        #incoming message from remote server
        try:
            from_srv_str = sock_fd.recv(1024, MSG_DONTWAIT)
        except socket_module.error, e:
            ## https://stackoverflow.com/a/16745561/667301
            errstr, errint, errcode = get_errno_info(e, op='recv',
                debugmsg=DEBUG)
            if errcode=='EAGAIN' or errcode=='EWOULDBLOCK':
                # Busy, try again later...
                print"recv() BLOCKED"
                continue
            elif errcode=='ECONNRESET' or errcode=='EPIPE':
                # Server ended normally...
                sys.exit(0)

        ## NOTE: if we get this far, we successfully received from_srv_str.
        ##    Anything caught above, is some kind of fail...
        print"from_srv_str: {0}".format(from_srv_str)

    ## Adding dynamic_list, per input from EJP, below...
    if need_send_select is False:
        dynamic_list = socket_list
    else:
        dynamic_list = send_ready_list
    for sock_fd in dynamic_list:
        # outgoing message to remote server
        if sock_fd is c2s:
            try:
                to_srv_str = 'client->server {0}'.format(to_srv_idx)
                sock_fd.send(to_srv_str, MSG_DONTWAIT)

                               ##
                time.sleep(1)  ## Client blocks the server here... Why????
                               ##

                to_srv_idx += 1
                need_send_select = False
            except socket_module.error, e:
                errstr, errint, errcode = get_errno_info(e, op='send',
                    debugmsg=DEBUG)
                if errcode=='EAGAIN' or errcode=='EWOULDBLOCK':
                    ## Try to send() later...
                    print"send() BLOCKED"
                    need_send_select = True
                    continue
                elif errcode=='ECONNRESET' or errcode=='EPIPE':
                    # Server ended normally...
                    sys.exit(0)

原始问题代码:

tcp_server.py

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
#!/usr/bin/python -u
from socket import AF_INET, SOCK_STREAM, SO_REUSEADDR, SOL_SOCKET
#from socket import MSG_OOB  <--- for send()
from socket import socket
import socket as socket_module
import select
import fcntl
import os

host = ''
port = 9997

serv_sock = socket(AF_INET, SOCK_STREAM)
serv_sock.setsockopt(SOL_SOCKET, SOCK_STREAM, 1)
serv_sock.bind((host, port))
serv_sock.listen(5)

fcntl.fcntl(serv_sock, fcntl.F_SETFL, os.O_NONBLOCK)  # Make the socket non-blocking

sock_list = [serv_sock]

from_client_str = '__DEFAULT__'

to_client_idx = 0
loop_count = 0
while True:
    recv_ready_list, send_ready_list, exception_ready = select.select(sock_list, sock_list,
        [], 5)

    print"---"
    print"LOOP_COUNT",  loop_count

    ## Read all sockets which are input-ready... might be client or server...
    for sock_fd in recv_ready_list:

        # accept() if we're reading on the server socket...
        if sock_fd is serv_sock:
            clientsock, clientaddr = sock_fd.accept()
            sock_list.append(clientsock)

        # read input from the client socket...
        else:
            try:
                from_client_str = sock_fd.recv(4096)
                if from_client_str=='':
                    # Client closed the socket...
                    print"CLIENT CLOSED SOCKET"
                    sock_list.remove(sock_fd)
            except socket_module.error, e:
                print"WARNING RECV FAIL"


            print"from_client_str: '{0}'".format(from_client_str)

    for sock_fd in send_ready_list:
        if sock_fd is not serv_sock:
            try:
                to_client_str ="server->client: {0}
"
.format(to_client_idx)
                sock_fd.send(to_client_str)
                to_client_idx += 1
            except socket_module.error, e:
                print"TO CLIENT SEND ERROR", e

    loop_count += 1

tcp_client.py

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
#!/usr/bin/python -u

from socket import AF_INET, SOCK_STREAM
from socket import gethostname, socket
import socket as socket_module
import select
import fcntl
import errno
import time
import sys
import os

## NOTE: Using this script to simulate a scheduler
SERVER_HOSTNAME = 'myHostname'
SERVER_DOMAIN = 'mydomain.local'
PORT = 9997

def handle_socket_error_continue(e):
    ## non-blocking socket info from:
    ## https://stackoverflow.com/a/16745561/667301
    print"HANDLE_SOCKET_ERROR_CONTINUE"
    err = e.args[0]
    if (err==errno.EAGAIN) or (err==errno.EWOULDBLOCK):
        print 'CLIENT DEBUG: No data input from server'
        return True
    else:
        print 'FROM SERVER RECV ERROR: {0}'.format(e)
        sys.exit(1)

c2s = socket(AF_INET, SOCK_STREAM) # Client to server socket...
c2s.connect(('.'.join((SERVER_HOSTNAME, SERVER_DOMAIN,)), PORT))
# Set socket non-blocking...
fcntl.fcntl(c2s, fcntl.F_SETFL, os.O_NONBLOCK)

to_srv_idx = 0
while True:
    socket_list = [c2s]

    # Get the list sockets which can: take input, output, etc...
    recv_ready_list, send_ready_list, exception_ready = select.select(
        socket_list, socket_list, [])

    for sock_fd in recv_ready_list:
        assert sock_fd is c2s,"Strange socket failure here"

        #incoming message from remote server
        try:
            from_srv_str = sock_fd.recv(4096)
        except socket_module.error, e:
            ## https://stackoverflow.com/a/16745561/667301
            err_continue = handle_socket_error_continue(e)
            if err_continue is True:
                continue
        else:
            if len(from_srv_str)==0:
                print"SERVER CLOSED NORMALLY"
                sys.exit(0)

        ## NOTE: if we get this far, we successfully received from_srv_str.
        ##    Anything caught above, is some kind of fail...
        print"from_srv_str: {0}".format(from_srv_str)

    for sock_fd in send_ready_list:
        #incoming message from remote server
        if sock_fd is c2s:
            #to_srv_str = raw_input('Send to server: ')
            try:
                to_srv_str = 'client->server {0}'.format(to_srv_idx)
                sock_fd.send(to_srv_str)

                               ##
                time.sleep(1)  ## Client blocks the server here... Why????
                               ##

                to_srv_idx += 1
            except socket_module.error, e:
                print"TO SERVER SEND ERROR", e

TCP套接字几乎总是准备好进行写入,除非它们的套接字发送缓冲区已满。

因此,总是为套接字选择可写性是不正确的。仅在遇到因EAGAIN / EWOULDBLOCK而导致发送失败后才这样做。否则,您的服务器将无意识地旋转来处理可写套接字,通常是所有套接字。


However, when I put a time.sleep(1) statement after the client does an
fd.send() to the server, the TCP server code intermittently pauses
while the client is sleeping.

AFAICT运行提供的代码(很好的独立示例,顺便说一句)后,服务器将按预期运行。

特别地,select()调用的语义是select()不应返回,除非有线程要做的事情。如果线程现在无法执行任何操作,则将线程块放在select()内部是一件好事,因为这可以防止线程无缘无故地旋转CPU。

因此,在这种情况下,您的服务器程序已经告诉select(),它希望select()仅在至少满足以下条件之一的情况下才返回:

  • serv_sock可供读取(也就是说,新客户端现在想连接到服务器)
  • serv_sock是可写的(我不认为这实际上发生在侦听套接字上,因此该标准可能会被忽略)
  • clientsock已准备就绪,可以读取(即,客户端已向服务器发送了一些字节,并且它们正在clientsock的缓冲区中等待服务器线程对它们进行recv())
  • clientsock是可写的(即,clientsock在其传出数据缓冲区中有一定空间,如果服务器要将send()数据发送回客户端,则服务器可以将send()数据放入其中)
  • select()的调用开始阻塞以来已经过去了五秒钟。
  • 我看到(通过打印调试),当服务器程序阻塞时,它正在select()内部阻塞,这表明在阻塞期间未满足上述5个条件。

    这是为什么?好吧,让我们往下看。

  • 未遇到,因为没有其他客户端尝试连接
  • 未遇见,因为这永远不会发生
  • 由于服务器已读取已连接客户端发送的所有数据(并且由于已连接客户端本身处于睡眠状态,因此不再发送任何数据),因此无法满足
  • 不能满足,因为服务器已填满clientsock的传出数据缓冲区(因为客户端程序正在睡眠,因此它只是间歇性地读取来自服务器的数据,并且TCP层保证了无损/有序传输,因此一旦clientsock的传出数据缓冲区已满,clientsock将不会选择即写即写,除非/直到客户端从连接结束时读取至少一些数据为止)
  • 未达到,因为自select()开始阻止以来还没有经过5秒。
  • 那么,此行为实际上是服务器的问题吗?实际上不是,因为服务器仍将对连接到该服务器的任何其他客户端作出响应。尤其是,只要serv_sock或任何其他客户端的套接字select()作为可读取(或可写入)的套接字,select()仍将立即返回,因此服务器可以很好地处理其他客户端等待被黑/缓慢的客户端唤醒。

    被黑/速度慢的客户端可能是用户的问题,但是服务器对此无能为力(除非强行断开客户端的TCP连接,否则可能会打印出一条日志消息,要求有人调试所连接的客户端程序,我假设:))。

    我同意EJP,顺便说一句-选择"准备写入"仅应在您实际要向其中写入一些数据的套接字上进行。如果您实际上不希望尽快写入套接字,那么指示select()在该套接字准备好写入后立即返回是毫无意义的,并且会适得其反:这样做的问题是每当任何套接字的输出数据缓冲区不足时,CPU都可能旋转很多(在大多数应用程序中,大多数时间都是这样!)。该问题的用户可见症状是即使您的服务器程序应该处于空闲状态或大部分为空闲状态,也正在消耗100%的CPU内核。


    If I wrote this test code correctly and the server shouldn't pause, why is the TCP server intermittently pausing while it polls the client's connection for data?

    回答我自己的问题。 我的阻塞问题是由于用非零的timeout调用select()引起的。

    当我将select()更改为使用零秒超时时,得到了预期的结果。