Using click.progressbar with multiprocessing in Python
我有一个庞大的清单需要处理,这需要一些时间,因此我将其分为4个部分,并对每个部分进行一些功能的多处理。 使用4个内核仍然需要一些时间,因此我想我会在函数中添加一些进度条,以便可以告诉我每个处理器在处理列表时所处的位置。
我的梦想是拥有这样的东西:
1 2 3 4 | erasing close atoms, cpu0 [######..............................] 13% erasing close atoms, cpu1 [#######.............................] 15% erasing close atoms, cpu2 [######..............................] 13% erasing close atoms, cpu3 [######..............................] 14% |
随着功能循环的进行,每个小节都会移动。 但是,我得到了一个连续的流程:
等,填充我的终端窗口。
这是调用该函数的主要python脚本:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 | from eraseCloseAtoms import * from readPDB import * import multiprocessing as mp from vectorCalc import * prot, cell = readPDB('file') atoms = vectorCalc(cell) output = mp.Queue() # setup mp to erase grid atoms that are too close to the protein (dmin = 2.5A) cpuNum = 4 tasks = len(atoms) rangeSet = [tasks / cpuNum for i in range(cpuNum)] for i in range(tasks % cpuNum): rangeSet[i] += 1 rangeSet = np.array(rangeSet) processes = [] for c in range(cpuNum): na, nb = (int(np.sum(rangeSet[:c] + 1)), int(np.sum(rangeSet[:c + 1]))) processes.append(mp.Process(target=eraseCloseAtoms, args=(prot, atoms[na:nb], cell, 2.7, 2.5, output))) for p in processes: p.start() results = [output.get() for p in processes] for p in processes: p.join() atomsNew = results[0] + results[1] + results[2] + results[3] |
下面是函数
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 | import numpy as np import click def eraseCloseAtoms(protein, atoms, cell, spacing=2, dmin=1.4, output=None): print 'just need to erase close atoms' if dmin > spacing: print 'the spacing needs to be larger than dmin' return grid = [int(cell[0] / spacing), int(cell[1] / spacing), int(cell[2] / spacing)] selected = list(atoms) with click.progressbar(length=len(atoms), label='erasing close atoms') as bar: for i, atom in enumerate(atoms): bar.update(i) erased = False coord = np.array(atom[6]) for ix in [-1, 0, 1]: if erased: break for iy in [-1, 0, 1]: if erased: break for iz in [-1, 0, 1]: if erased: break for j in protein: protCoord = np.array(protein[int(j)][6]) trueDist = getMinDist(protCoord, coord, cell, vectors) if trueDist <= dmin: selected.remove(atom) erased = True break if output is None: return selected else: output.put(selected) |
接受的答案表示单击是不可能的,并且需要"数量不多的代码才能使它起作用"。
的确如此,还有另一个具有此功能的模块:tqdm
https://github.com/tqdm/tqdm完全可以满足您的需求。
您可以在文档https://github.com/tqdm/tqdm#nested-progress-bars等中执行嵌套进度条。
我在您的代码中看到两个问题。
第一个解释了为什么进度条经常显示
1 2 3 4 5 6 | with click.progressbar(atoms, label='erasing close atoms') as bar: for atom in bar: erased = False coord = np.array(atom[6]) # ... |
但是,由于您的代码存在第二个问题,因此这仍然无法用于多个迭代的流程,每个流程都有自己的进度条。
No printing must happen or the progress bar will be unintentionally destroyed.
这意味着只要您的进度条之一更新,它就会破坏所有其他活动进度条。
我认为没有简单的解决方案。交互式更新多行控制台输出非常困难(基本上,您需要使用curses或具有OS支持的类似"控制台GUI"库)。
1 | CPU1: [###### ] 52% CPU2: [### ] 30% CPU3: [######## ] 84% |
这将需要大量的代码来使其工作(特别是当更新来自多个进程时),但这并不是完全不切实际的。
如果您可以使用一个进度条,则类似的事情将起作用:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | import click import threading import numpy as np reallybiglist = [] numthreads = 4 def myfunc(listportion, bar): for item in listportion: # do a thing bar.update(1) with click.progressbar(length=len(reallybiglist), show_pos=True) as bar: threads = [] for listportion in np.split(reallybiglist, numthreads): thread = threading.Thread(target=myfunc, args=(listportion, bar)) thread.start() threads.append(thread) for thread in threads: thread.join() |
它可能与您的梦想不同,但是您可以将
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | import multiprocessing as mp import click import time def proc(arg): time.sleep(arg) return True def main(): p = mp.Pool(4) args = range(4) results = p.imap_unordered(proc, args) with click.progressbar(results, length=len(args)) as bar: for result in bar: pass if __name__ == '__main__: main() |