关于python 3.x:为什么协程不能与run_in_executor一起使用?

Why coroutines cannot be used with run_in_executor?

我想运行使用协程和多线程请求URL的服务。但是,我不能将协程传递给执行者中的工人。有关此问题的最小示例,请参见下面的代码:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
import time
import asyncio
import concurrent.futures

EXECUTOR = concurrent.futures.ThreadPoolExecutor(max_workers=5)

async def async_request(loop):
    await asyncio.sleep(3)

def sync_request(_):
    time.sleep(3)

async def main(loop):
    futures = [loop.run_in_executor(EXECUTOR, async_request,loop)
               for x in range(10)]

    await asyncio.wait(futures)

loop = asyncio.get_event_loop()
loop.run_until_complete(main(loop))

导致以下错误:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Traceback (most recent call last):
  File"co_test.py", line 17, in <module>
    loop.run_until_complete(main(loop))
  File"/usr/lib/python3.5/asyncio/base_events.py", line 387, in run_until_complete
    return future.result()
  File"/usr/lib/python3.5/asyncio/futures.py", line 274, in result
    raise self._exception
  File"/usr/lib/python3.5/asyncio/tasks.py", line 239, in _step
    result = coro.send(None)
  File"co_test.py", line 10, in main
    futures = [loop.run_in_executor(EXECUTOR, req,loop) for x in range(10)]
  File"co_test.py", line 10, in <listcomp>
    futures = [loop.run_in_executor(EXECUTOR, req,loop) for x in range(10)]
  File"/usr/lib/python3.5/asyncio/base_events.py", line 541, in run_in_executor
    raise TypeError("coroutines cannot be used with run_in_executor()")
TypeError: coroutines cannot be used with run_in_executor()

我知道我可以使用sync_request函数而不是async_request,在这种情况下,我可以通过将阻塞函数发送到另一个线程来获得协程。

我也知道我可以在事件循环中调用async_request十次。如下代码所示:

1
2
3
loop = asyncio.get_event_loop()
futures = [async_request(loop) for i in range(10)]
loop.run_until_complete(asyncio.wait(futures))

但是在这种情况下,我将使用单个线程。

我如何使用两种方案,协程在多线程中工作?正如您从代码中看到的那样,我正在将(不使用)pool传递给async_request,希望我可以编写一些代码来告诉工作人员创造未来,将其发送到池中并异步进行(释放工人)等待结果。

我要这样做的原因是使应用程序可伸缩。这是不必要的步骤吗?我应该简单地为每个网址提供一个线程,就是这样吗?就像是:

1
2
LEN = len(list_of_urls)
EXECUTOR = concurrent.futures.ThreadPoolExecutor(max_workers=LEN)

够好吗?


您必须在线程上下文中创建并设置新的事件循环,才能运行协程:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
import asyncio
from concurrent.futures import ThreadPoolExecutor


def run(corofn, *args):
    loop = asyncio.new_event_loop()
    try:
        coro = corofn(*args)
        asyncio.set_event_loop(loop)
        return loop.run_until_complete(coro)
    finally:
        loop.close()


async def main():
    loop = asyncio.get_event_loop()
    executor = ThreadPoolExecutor(max_workers=5)
    futures = [
        loop.run_in_executor(executor, run, asyncio.sleep, 1, x)
        for x in range(10)]
    print(await asyncio.gather(*futures))
    # Prints: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]


if __name__ == '__main__':
    loop = asyncio.get_event_loop()
    loop.run_until_complete(main())


根据我对问题的了解,您正在尝试使用每个线程来:

  • 触发协程执行
  • 有空接收更多协程以触发
  • 等待一切以异步方式结束

但是,一旦调用循环(无论是主循环还是新循环)以等待结果,它将阻塞线程等待。

并且,通过将run_in_executor与一堆同步函数一起使用,线程实际上不知道在到达等待循环的点之前一次是否有更多协程要分派。

我认为,如果要以这样的方式调度一堆协程,以便每个线程在自己的事件循环中管理自己的协程组,则以下代码实现了总计1秒的时间,多线程等待10次异步睡眠(共1次)第二。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
import asyncio
import threading
from asyncio import AbstractEventLoop
from concurrent.futures import ThreadPoolExecutor
from time import perf_counter
from typing import Dict, Set

import _asyncio

event_loops_for_each_thread: Dict[int, AbstractEventLoop] = {}


def run(corofn, *args):
    curr_thread_id = threading.current_thread().ident

    if curr_thread_id not in event_loops_for_each_thread:
        event_loops_for_each_thread[curr_thread_id] = asyncio.new_event_loop()

    thread_loop = event_loops_for_each_thread[curr_thread_id]
    coro = corofn(*args)
    return thread_loop.create_task(coro)


async def async_gather_tasks(all_tasks: Set[_asyncio.Task]):
    return await asyncio.gather(*all_tasks)


def wait_loops():
    # each thread will block waiting all async calls of its specific async loop
    curr_thread_id = threading.current_thread().ident
    threads_event_loop = event_loops_for_each_thread[curr_thread_id]
   
    # I print the following to prove that each thread is waiting its loop
    print(f'Thread {curr_thread_id} will wait its tasks.')
    return threads_event_loop.run_until_complete(async_gather_tasks(asyncio.all_tasks(threads_event_loop)))


async def main():
    loop = asyncio.get_event_loop()
    max_workers = 5
    executor = ThreadPoolExecutor(max_workers=max_workers)

    # dispatching async tasks for each thread.
    futures = [
        loop.run_in_executor(executor, run, asyncio.sleep, 1, x)
        for x in range(10)]

    # waiting the threads finish dispatching the async executions to its own event loops
    await asyncio.wait(futures)

    # at this point the async events were dispatched to each thread event loop

    # in the lines below, you tell each worker thread to wait all its async tasks completion.
    futures = [
        loop.run_in_executor(executor, wait_loops)
        for _ in range(max_workers)
    ]
   
    print(await asyncio.gather(*futures))
    # it will print something like:
    # [[1, 8], [0], [6, 3, 9, 7], [4], [2, 5]]
    # each sub-set is the result of the tasks of a thread
    # it is non-deterministic, so it will return a diferent array of arrays each time you run.


if __name__ == '__main__':
    loop = asyncio.get_event_loop()
    start = perf_counter()
    loop.run_until_complete(main())
    end = perf_counter()
    duration_s = end - start
    # the print below proves that all threads are waiting its tasks asynchronously
    print(f'duration_s={duration_s:.3f}')