Making 1 milion requests with aiohttp/asyncio - literally
我跟进了本教程:https://pawelmhm.github.io/asyncio/python/aiohttp/2016/04/22/asyncio-aiohttp.html,当我处理5万个请求时一切正常。 但是我需要执行1百万次API调用,然后我对此代码有疑问:
1 2 3 4 5 6 7 8 9 10 | url ="http://some_url.com/?id={}" tasks = set() sem = asyncio.Semaphore(MAX_SIM_CONNS) for i in range(1, LAST_ID + 1): task = asyncio.ensure_future(bound_fetch(sem, url.format(i))) tasks.add(task) responses = asyncio.gather(*tasks) return await responses |
因为Python需要创建1百万个任务,所以它基本上只是滞后,然后在终端中打印
一次安排所有一百万个任务
这是您正在谈论的代码。 它最多占用3 GB RAM,因此如果您的可用内存不足,很可能会被操作系统终止。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 | import asyncio from aiohttp import ClientSession MAX_SIM_CONNS = 50 LAST_ID = 10**6 async def fetch(url, session): async with session.get(url) as response: return await response.read() async def bound_fetch(sem, url, session): async with sem: await fetch(url, session) async def fetch_all(): url ="http://localhost:8080/?id={}" tasks = set() async with ClientSession() as session: sem = asyncio.Semaphore(MAX_SIM_CONNS) for i in range(1, LAST_ID + 1): task = asyncio.create_task(bound_fetch(sem, url.format(i), session)) tasks.add(task) return await asyncio.gather(*tasks) if __name__ == '__main__': asyncio.run(fetch_all()) |
使用队列简化工作
这是我的建议,如何使用asyncio.Queue将URL传递给辅助任务。 队列按需填充,没有预制的URL列表。
仅需30 MB RAM :)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 | import asyncio from aiohttp import ClientSession MAX_SIM_CONNS = 50 LAST_ID = 10**6 async def fetch(url, session): async with session.get(url) as response: return await response.read() async def fetch_worker(url_queue): async with ClientSession() as session: while True: url = await url_queue.get() try: if url is None: # all work is done return response = await fetch(url, session) # ...do something with the response finally: url_queue.task_done() # calling task_done() is necessary for the url_queue.join() to work correctly async def fetch_all(): url ="http://localhost:8080/?id={}" url_queue = asyncio.Queue(maxsize=100) worker_tasks = [] for i in range(MAX_SIM_CONNS): wt = asyncio.create_task(fetch_worker(url_queue)) worker_tasks.append(wt) for i in range(1, LAST_ID + 1): await url_queue.put(url.format(i)) for i in range(MAX_SIM_CONNS): # tell the workers that the work is done await url_queue.put(None) await url_queue.join() await asyncio.gather(*worker_tasks) if __name__ == '__main__': asyncio.run(fetch_all()) |
异步是受内存限制的(就像其他程序一样)。 您不能产生更多可以容纳内存的任务。 我的猜测是您达到了内存限制。 检查dmesg以获取更多信息。
1百万个RPS并不意味着有1百万个任务。 一个任务可以在同一秒内执行多个请求。