Why coroutines cannot be used with run_in_executor?
我想运行使用协程和多线程请求URL的服务。但是,我不能将协程传递给执行者中的工人。有关此问题的最小示例,请参见下面的代码:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | import time import asyncio import concurrent.futures EXECUTOR = concurrent.futures.ThreadPoolExecutor(max_workers=5) async def async_request(loop): await asyncio.sleep(3) def sync_request(_): time.sleep(3) async def main(loop): futures = [loop.run_in_executor(EXECUTOR, async_request,loop) for x in range(10)] await asyncio.wait(futures) loop = asyncio.get_event_loop() loop.run_until_complete(main(loop)) |
导致以下错误:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | Traceback (most recent call last): File"co_test.py", line 17, in <module> loop.run_until_complete(main(loop)) File"/usr/lib/python3.5/asyncio/base_events.py", line 387, in run_until_complete return future.result() File"/usr/lib/python3.5/asyncio/futures.py", line 274, in result raise self._exception File"/usr/lib/python3.5/asyncio/tasks.py", line 239, in _step result = coro.send(None) File"co_test.py", line 10, in main futures = [loop.run_in_executor(EXECUTOR, req,loop) for x in range(10)] File"co_test.py", line 10, in <listcomp> futures = [loop.run_in_executor(EXECUTOR, req,loop) for x in range(10)] File"/usr/lib/python3.5/asyncio/base_events.py", line 541, in run_in_executor raise TypeError("coroutines cannot be used with run_in_executor()") TypeError: coroutines cannot be used with run_in_executor() |
我知道我可以使用
我也知道我可以在事件循环中调用
1 2 3 | loop = asyncio.get_event_loop() futures = [async_request(loop) for i in range(10)] loop.run_until_complete(asyncio.wait(futures)) |
但是在这种情况下,我将使用单个线程。
我如何使用两种方案,协程在多线程中工作?正如您从代码中看到的那样,我正在将(不使用)
我要这样做的原因是使应用程序可伸缩。这是不必要的步骤吗?我应该简单地为每个网址提供一个线程,就是这样吗?就像是:
1 2 | LEN = len(list_of_urls) EXECUTOR = concurrent.futures.ThreadPoolExecutor(max_workers=LEN) |
够好吗?
您必须在线程上下文中创建并设置新的事件循环,才能运行协程:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 | import asyncio from concurrent.futures import ThreadPoolExecutor def run(corofn, *args): loop = asyncio.new_event_loop() try: coro = corofn(*args) asyncio.set_event_loop(loop) return loop.run_until_complete(coro) finally: loop.close() async def main(): loop = asyncio.get_event_loop() executor = ThreadPoolExecutor(max_workers=5) futures = [ loop.run_in_executor(executor, run, asyncio.sleep, 1, x) for x in range(10)] print(await asyncio.gather(*futures)) # Prints: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] if __name__ == '__main__': loop = asyncio.get_event_loop() loop.run_until_complete(main()) |
根据我对问题的了解,您正在尝试使用每个线程来:
- 触发协程执行
- 有空接收更多协程以触发
- 等待一切以异步方式结束
但是,一旦调用循环(无论是主循环还是新循环)以等待结果,它将阻塞线程等待。
并且,通过将run_in_executor与一堆同步函数一起使用,线程实际上不知道在到达等待循环的点之前一次是否有更多协程要分派。
我认为,如果要以这样的方式调度一堆协程,以便每个线程在自己的事件循环中管理自己的协程组,则以下代码实现了总计1秒的时间,多线程等待10次异步睡眠(共1次)第二。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 | import asyncio import threading from asyncio import AbstractEventLoop from concurrent.futures import ThreadPoolExecutor from time import perf_counter from typing import Dict, Set import _asyncio event_loops_for_each_thread: Dict[int, AbstractEventLoop] = {} def run(corofn, *args): curr_thread_id = threading.current_thread().ident if curr_thread_id not in event_loops_for_each_thread: event_loops_for_each_thread[curr_thread_id] = asyncio.new_event_loop() thread_loop = event_loops_for_each_thread[curr_thread_id] coro = corofn(*args) return thread_loop.create_task(coro) async def async_gather_tasks(all_tasks: Set[_asyncio.Task]): return await asyncio.gather(*all_tasks) def wait_loops(): # each thread will block waiting all async calls of its specific async loop curr_thread_id = threading.current_thread().ident threads_event_loop = event_loops_for_each_thread[curr_thread_id] # I print the following to prove that each thread is waiting its loop print(f'Thread {curr_thread_id} will wait its tasks.') return threads_event_loop.run_until_complete(async_gather_tasks(asyncio.all_tasks(threads_event_loop))) async def main(): loop = asyncio.get_event_loop() max_workers = 5 executor = ThreadPoolExecutor(max_workers=max_workers) # dispatching async tasks for each thread. futures = [ loop.run_in_executor(executor, run, asyncio.sleep, 1, x) for x in range(10)] # waiting the threads finish dispatching the async executions to its own event loops await asyncio.wait(futures) # at this point the async events were dispatched to each thread event loop # in the lines below, you tell each worker thread to wait all its async tasks completion. futures = [ loop.run_in_executor(executor, wait_loops) for _ in range(max_workers) ] print(await asyncio.gather(*futures)) # it will print something like: # [[1, 8], [0], [6, 3, 9, 7], [4], [2, 5]] # each sub-set is the result of the tasks of a thread # it is non-deterministic, so it will return a diferent array of arrays each time you run. if __name__ == '__main__': loop = asyncio.get_event_loop() start = perf_counter() loop.run_until_complete(main()) end = perf_counter() duration_s = end - start # the print below proves that all threads are waiting its tasks asynchronously print(f'duration_s={duration_s:.3f}') |