起源

对tornado的StackContext的研究起源于一个优化问题.后来研究讨论的优化方案,需要修改每一个函数入参(OMG)或者只需协程安全的全局变量.
　　但是tornado的协程只是个抽象概念,没有实体.比如线程要实现这个,有个threading local就可以(可以放置线程独立的全局资源).如果协程也有类似的功能就完美了,所以一个StackContent出现了.这货是什么出身,参考tornnado/ stack_context.py的第一段注释:

1 2	`StackContext` allows applications to maintain threadlocal-like state that follows execution as it moves to other execution contexts.

是的,没错,这货就是用来维护协程的上下文, 以实现协程的'threadlocal'功能.

研究

StackContext的核心代码就在tornnado/stack_context.py中. 下面主要分析其中的几个核心部分.
　　１. 首先映入我们眼帘的是

1
2
3
4

class _State(threading.local):
def __init__(self):
self.contexts = (tuple(), None)
_state = _State()

暂时不用太关心实现细节(本来也没多少细节好不).主要说明的是_state是线程独立(因为继承于threading.local).然后简单的回顾下知识点:一个进程可以有多个线程,但是对于单核cpu,同一时间只能有一个线程在执行,当某一个线程执行时,寄存器的状态,特有数据的状态(threading.local)等等组成了他执行的上下文环境.线程不停切换时,上下文也在不停的切换.现在针对协程,我们做个映射, 把进程映射为线程,把线程映射为协程. _state就是用来指向当前执行协程的上下文环境.(应该还没晕吧)

2.我们继续拾级而上,看到的是:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45

class StackContext(object):
def __init__(self, context_factory):
self.context_factory = context_factory
self.contexts = []
self.active = True

def _deactivate(self):
self.active = False

# StackContext protocol
def enter(self):
context = self.context_factory()
self.contexts.append(context)
context.__enter__()

def exit(self, type, value, traceback):
context = self.contexts.pop()
context.__exit__(type, value, traceback)

def __enter__(self):
self.old_contexts = _state.contexts
self.new_contexts = (self.old_contexts[0] + (self,), self)
_state.contexts = self.new_contexts

try:
self.enter()
except:
_state.contexts = self.old_contexts
raise

return self._deactivate

def __exit__(self, type, value, traceback):
try:
self.exit(type, value, traceback)
finally:
final_contexts = _state.contexts
_state.contexts = self.old_contexts

if final_contexts is not self.new_contexts:
raise StackContextInconsistentError(
'stack_context inconsistency (may be caused by yield '
'within a "with StackContext" block)')

self.new_contexts = None

StackContext就是我所说的上下文对象了.但是是时候坦白了,这货其实只是个跑腿的.真正管理协程上下文的是

1 2	def __init__(self, context_factory): self.context_factory = context_factory

context_factory 可以理解为一个上下文的管理者,由它来生成一个真正的上下文context, context控制一个协程上下文的创建和退出.在协程切换时,StackContext会告诉前一个context你被开除了(__exit__),并告诉context_factory赶紧给我找个新的context, 项目就要开工了(__enter__).
于是下面这两个函数就很好理解了:

1
2
3
4

def enter(self):
context = self.context_factory()
self.contexts.append(context)
context.__enter__()

就是先用context_factory上下文管理者生成一个上下文,然后保存该上下文(退出时用),最后进入了该上下文

1
2
3

def exit(self, type, value, traceback):
context = self.contexts.pop()
context.__exit__(type, value, traceback)

这就是先取出最后的一个上下文,然后退出.
这就是StackContext,context_factory,context三者的关系了.

上面说明了StackContext对于协程上下文的创建和摧毁,下面说明下StackContext:

1
2
3
4
5
6
7
8
9
10
11
12

def __enter__(self):
self.old_contexts = _state.contexts
self.new_contexts = (self.old_contexts[0] + (self,), self)
_state.contexts = self.new_contexts

try:
self.enter()
except:
_state.contexts = self.old_contexts
raise

return self._deactivate

StackContext因为是栈式上下文,所以__enter__里面干的活就是:先保存现有的上下文,再将自己放入上下文堆栈的栈顶,最后重新设置当前的上下文环境.
　　3.最后高潮即将来临:我们先总结下:

_state用来指向当前运行协程的上下文的,协程不断切换过程中,_state也指向不同的上下文.
StackContext负责上下文切换的具体工作,即退出之前的上下文,进入新的上下文,忙成狗的角色.

有个这两个对象,最后看下wrap函数(简化后):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64

def wrap(fn):
cap_contexts = [_state.contexts]

def wrapped(*args, **kwargs):
ret = None
try:
current_state = _state.contexts

# Remove deactivated items
cap_contexts[0] = contexts = _remove_deactivated(cap_contexts[0])

# Force new state
_state.contexts = contexts

# Apply stack contexts
last_ctx = 0
stack = contexts[0]

# Apply state
for n in stack:
try:
n.enter()
last_ctx += 1
except:
pass

if top is None:
try:
ret = fn(*args, **kwargs)
except:
exc = sys.exc_info()
top = contexts[1]

# If there was exception, try to handle it by going through the exception chain
if top is not None:
exc = _handle_exception(top, exc)
else:
# Otherwise take shorter path and run stack contexts in reverse order
while last_ctx > 0:
last_ctx -= 1
c = stack[last_ctx]

try:
c.exit(*exc)
except:
exc = sys.exc_info()
top = c.old_contexts[1]
break
else:
top = None

# If if exception happened while unrolling, take longer exception handler path
if top is not None:
exc = _handle_exception(top, exc)

# If exception was not handled, raise it
if exc != (None, None, None):
raise_exc_info(exc)
finally:
_state.contexts = current_state
return ret

wrapped._wrapped = True
return wrapped

这里有两种上下文:定义时上下文和执行时上下文.定义时上下文是协程函数定义时指定的上下文, 运行时上下文是协程函数运行时系统所处的上下文.即协程函数定义时说,我要在有空调,有可乐的环境下工作,但是系统在不停切换后,切换到那个协程时,系统环境只有个破风扇在转着.
　　协程在这种情况下,只能自己创造自己喜欢的环境了(将运行时环境改造成定义说明的环境).当初研究到这我有点想不通,定义时的环境如何一直保存着呢?答案是通过闭包.
　　现在再看这个函数时,就比较好理解了, cap_contexts = [_state.contexts]就是将定义时上下文保存到了cap_contexts.而wrapped就是我们最终扔给IoLoop的协程函数了.wrapped具体什么时候执行,执行时候的_state是什么,都是不确定的,所以wrapped主要工作就是将cap_contexts保存的上下文,替换到当前上下文中.
下面基本分析下流程:
cap_contexts[0] = contexts = _remove_deactivated(cap_contexts[0])移除定义时有效,但是执行时已经无效的上下文.

1
2
3
4
5
6

for n in stack:
try:
n.enter()
last_ctx += 1
except:
pass

每个stack里面的元素的就是StackContext对象,也即按从栈底到栈顶的顺序,逐一恢复到定义时上下文环境.

1
2
3

if top is None:
try:
ret = fn(*args, **kwargs)

如果恢复时没有异常,才开始执行真正的协程代码

1
2
3
4
5
6

while last_ctx > 0:
last_ctx -= 1
c = stack[last_ctx]

try:
c.exit(*exc)

协程函数执行结束后,按相反的顺序退出栈式上下文.
基本上大致流程就是这样了.
wrap函数主要的使用场景就是将协程函数放入协程引擎前(IOLoop),加上一层上下文管理功能.具体可参加tornado/ioloop.py的PollIOLoop.add_callback

应用

以上扯了这么多,其实就是说明了tornado协程上下文切换的大体机制,但是具体的上下文还是需要自己实现,而实现的关键就是context_factory.
github上就有人实现了一个context_factory 地址是:https://github.com/viewfinderco/viewfinder/blob/master/backend/base/context_local.py.
　　只要写个子类继承下里面的ContextLocal, 你就拥有了一个context_factory,按他文章的例子,定义一个子类MyContext, 那么就可以按如下代码使用:

1	yield run_with_stack_context(StackContext(MyContext(coroutine_value)), your_func)

这里需要说明几点:
1 coroutine就相当于协程独立的变量,就是我们最终想要的功能,可以实现一个管理协程资源的类,然后将他的实例传递进去.
2 上面这个写法只是针对your_func是协程函数的情况,如果针对普通函数,只需:

1 2	with StackContext(MyContext(coroutine_value)): your_func()

3 为什么针对协程函数会这么特别,这是因为直接用普遍函数的调用方法会导致下上文堆栈不匹配.具体原因写写有点麻烦,可以看tornado/gen.py 的_make_coroutine_wrapper里处理stack_context.StackContextInconsistentError的代码(看代码很难看出原因,用调试器跟踪下执行流程,就会明白原因的,应该是tornado之前的bug).run_with_stack_context就是torndao专门封装用于处理协程函数的(其实就是bug修复函数), 不过这函数有点坑爹,如果你的协程函数要传参的话,要用偏函数或者自己写个run_with_stack_context(这玩意就2行代码)

终章

终于把这玩意写完了...

码农家园

tornado stackcontext解析

起源

研究

应用

终章