关于python:使用pdb附加进程

Attaching a process with pdb

我有一个python脚本,我怀疑存在死锁。我试图用pdb进行调试,但是如果我一步一步地进行调试,它就不会出现死锁,并且通过返回的输出,我可以看到它不会在同一个迭代中被挂起。我只想在脚本被锁定时将其附加到调试器,是否可以?如果必要的话,我愿意使用其他的调试程序。


此时,PDB无法在正在运行的程序上停止并开始调试。您还有其他几个选择:

GDB

您可以使用gdb在C级别进行调试。这有点抽象,因为您是在搜索Python的C源代码,而不是实际的Python脚本,但它在某些情况下可能很有用。说明如下:https://wiki.python.org/moin/debuggingwithgdb。他们太投入了,不能在这里总结。

第三方扩展和模块

只需谷歌搜索"PDB附加过程"就可以发现几个项目,使PDB具备以下能力:版权所有:https://github.com/google/pyringepycharm:https://blog.jetbrains.com/pycharm/2015/02/feature-spotlight-python-debugger-and-attach-to-process/python wiki的这个页面有几种选择:https://wiki.python.org/moin/pythondebuggingtools

对于您的特定用例,我对解决方法有一些想法:

信号

如果您在Unix上,可以使用类似于此博客文章中的信号来尝试停止并附加到正在运行的脚本。

此报价块直接从链接的日志中复制:

Of course pdb has already got functions to start a debugger in the middle of your program, most notably pdb.set_trace(). This however requires you to know where you want to start debugging, it also means you can't leave it in for production code.

But I've always been envious of what I can do with GDB: just interrupt a running program and start to poke around with a debugger. This can be handy in some situations, e.g. you're stuck in a loop and want to investigate. And today it suddenly occurred to me: just register a signal handler that sets the trace function! Here the proof of concept code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
import os
import signal
import sys
import time    

def handle_pdb(sig, frame):
    import pdb
    pdb.Pdb().set_trace(frame)    

def loop():
    while True:
        x = 'foo'
        time.sleep(0.2)

if __name__ == '__main__':
    signal.signal(signal.SIGUSR1, handle_pdb)
    print(os.getpid())
    loop()

Now I can send SIGUSR1 to the running application and get a debugger. Lovely!

I imagine you could spice this up by using Winpdb to allow remote debugging in case your application is no longer attached to a terminal. And the other problem the above code has is that it can't seem to resume the program after pdb got invoked, after exiting pdb you just get a traceback and are done (but since this is only bdb raising the bdb.BdbQuit exception I guess this could be solved in a few ways). The last immediate issue is running this on Windows, I don't know much about Windows but I know they don't have signals so I'm not sure how you could do this there.

条件断点和循环

如果没有可用的信号,如果您将锁或信号量采集包装在一个循环中以增加计数器,并且只有当计数达到一个非常大的数字时,您才可以使用PDB。例如,假设您怀疑有一个锁是死锁的一部分:

1
lock.acquire() # some lock or semaphore from threading or multiprocessing

以这种方式重写:

1
2
3
4
5
6
7
count = 0
while not lock.acquire(False): # Start a loop that will be infinite if deadlocked
    count += 1

    continue # now set a conditional breakpoint here in PDB that will only trigger when
             # count is a ridiculously large number:
             # pdb> <filename:linenumber>, count=9999999999

当计数非常大时,断点应该触发,(希望)指示死锁发生在那里。如果您发现它是在锁定对象似乎不指示死锁的情况下触发的,那么您可能需要在循环中插入一个短的时间延迟,这样它就不会增长得太快。您还可能需要利用断点的触发阈值来让它在正确的时间触发。我例子中的数字是任意的。

另一种变体是不使用pdb,并在计数器变大时故意引发异常,而不是触发断点。如果编写自己的异常类,则可以使用它将异常中的所有本地信号量/锁状态捆绑在一起,然后在脚本的顶层捕获它,以便在退出之前立即打印出来。

文件指示符

使用死锁循环而不依赖正确的计数器的另一种方法是写入文件:

1
2
3
4
5
6
7
import time

while not lock.acquire(False): # Start a loop that will be infinite if deadlocked
    with open('checkpoint_a.txt', 'a') as fo: # open a unique filename
        fo.write("
Hit"
) # write indicator to file
        time.sleep(3)     # pause for a moment so the file size doesn't explode

现在让程序运行一两分钟。终止程序并检查那些"检查点"文件。如果死锁是导致程序暂停的原因,那么其中多次写入"hit"一词的文件会指示哪些锁获取是导致死锁的原因。

您可以通过使用循环打印变量或其他状态信息(而不仅仅是常量)来扩展此功能的有用性。例如,您说您怀疑死锁发生在一个循环中,但不知道它在进行什么迭代。让这个锁循环转储循环的控制变量或其他状态信息,以标识发生死锁的迭代。


有一个pdb的克隆,想象中称为pdb克隆,它可以附加到正在运行的进程上。

只需在主进程的代码中添加from pdb_clone import pdbhandler; pdbhandler.register(),然后就可以用pdb-attach --kill --pid PID启动pdb。