关于python:加载预训练网络时错误表示扁平维度

Error indicates flattened dimensions when loading pre-trained network

问题

我正在尝试加载预训练网络,但出现以下错误

F1101 23:03:41.857909 73 net.cpp:757] Cannot copy param 0 weights
from layer 'fc4'; shape mismatch. Source param shape is 512 4096
(2097152); target param shape is 512 256 4 4 (2097152). To learn this
layer's parameters from scratch rather than copying from a saved net,
rename the layer.

我注意到 512 x 256 x 4 x 4 == 512 x 4096,所以似乎在保存和重新加载网络权重时,图层以某种方式变平了。

如何解决这个错误?

重现

我正在尝试在这个 GitHub 存储库中使用 D-CNN 预训练网络。

我用

加载网络

1
2
import caffe
net = caffe.Net('deploy_D-CNN.prototxt', 'D-CNN.caffemodel', caffe.TEST)

prototxt 文件是

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
name:"D-CNN"
input:"data"
input_dim: 10
input_dim: 3
input_dim: 259
input_dim: 259
layer {
  name:"conv1"
  type:"Convolution"
  bottom:"data"
  top:"conv1"
  convolution_param {
    num_output: 64
    kernel_size: 5
    stride: 2
  }
}
layer {
  name:"relu1"
  type:"ReLU"
  bottom:"conv1"
  top:"conv1"
}
layer {
  name:"pool1"
  type:"Pooling"
  bottom:"conv1"
  top:"pool1"
  pooling_param {
    pool: MAX
    kernel_size: 2
    stride: 2
  }
}
layer {
  name:"norm1"
  type:"LRN"
  bottom:"pool1"
  top:"norm1"
  lrn_param {
    local_size: 5
    alpha: 0.0001
    beta: 0.75
  }
}
layer {
  name:"conv2"
  type:"Convolution"
  bottom:"norm1"
  top:"conv2"
  convolution_param {
    num_output: 128
    pad: 1
    kernel_size: 3
  }
}
layer {
  name:"relu2"
  type:"ReLU"
  bottom:"conv2"
  top:"conv2"
}
layer {
  name:"pool2"
  type:"Pooling"
  bottom:"conv2"
  top:"pool2"
  pooling_param {
    pool: MAX
    kernel_size: 2
    stride: 2
  }
}
layer {
  name:"conv3"
  type:"Convolution"
  bottom:"pool2"
  top:"conv3"
  convolution_param {
    num_output: 256
    pad: 1
    kernel_size: 3
    stride: 1
  }
}
layer {
  name:"relu3"
  type:"ReLU"
  bottom:"conv3"
  top:"conv3"
}
layer {
  name:"fc4"
  type:"Convolution"
  bottom:"conv3"
  top:"fc4"
  convolution_param {
    num_output: 512
    pad: 0
    kernel_size: 4
  }
}
layer {
  name:"relu4"
  type:"ReLU"
  bottom:"fc4"
  top:"fc4"
}
layer {
  name:"drop4"
  type:"Dropout"
  bottom:"fc4"
  top:"fc4"
  dropout_param {
    dropout_ratio: 0.5
  }
}
layer {
  name:"pool5_spm3"
  type:"Pooling"
  bottom:"fc4"
  top:"pool5_spm3"
  pooling_param {
    pool: MAX
    kernel_size: 10
    stride: 10
  }
}
layer {
  name:"pool5_spm3_flatten"
  type:"Flatten"
  bottom:"pool5_spm3"
  top:"pool5_spm3_flatten"
}
layer {
  name:"pool5_spm2"
  type:"Pooling"
  bottom:"fc4"
  top:"pool5_spm2"
  pooling_param {
    pool: MAX
    kernel_size: 14
    stride: 14
  }
}
layer {
  name:"pool5_spm2_flatten"
  type:"Flatten"
  bottom:"pool5_spm2"
  top:"pool5_spm2_flatten"
}
layer {
  name:"pool5_spm1"
  type:"Pooling"
  bottom:"fc4"
  top:"pool5_spm1"
  pooling_param {
    pool: MAX
    kernel_size: 29
    stride: 29
  }
}
layer {
  name:"pool5_spm1_flatten"
  type:"Flatten"
  bottom:"pool5_spm1"
  top:"pool5_spm1_flatten"
}
layer {
  name:"pool5_spm"
  type:"Concat"
  bottom:"pool5_spm1_flatten"
  bottom:"pool5_spm2_flatten"
  bottom:"pool5_spm3_flatten"
  top:"pool5_spm"
  concat_param {
    concat_dim: 1
  }
}


layer {
  name:"fc4_2"
  type:"InnerProduct"
  bottom:"pool5_spm"
  top:"fc4_2"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  inner_product_param {
    num_output: 512
    weight_filler {
      type:"gaussian"
      std: 0.005
    }
    bias_filler {
      type:"constant"
      value: 0.1
    }
  }
}
layer {
  name:"relu4"
  type:"ReLU"
  bottom:"fc4_2"
  top:"fc4_2"
}
layer {
  name:"drop4"
  type:"Dropout"
  bottom:"fc4_2"
  top:"fc4_2"
  dropout_param {
    dropout_ratio: 0.5
  }
}

layer {
  name:"fc5"
  type:"InnerProduct"
  bottom:"fc4_2"
  top:"fc5"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  inner_product_param {
    num_output: 19
    weight_filler {
      type:"gaussian"
      std: 0.01
    }
    bias_filler {
      type:"constant"
      value: 0
    }
  }
}
layer {
  name:"prob"
  type:"Softmax"
  bottom:"fc5"
  top:"prob"
}

看起来你正在使用一个预训练的网络,其中层 "fc4" 是一个完全连接的层(又名 type:"InnerProduct" 层),它被"重塑"成一个卷积层。
由于内积层和卷积层都对输入执行大致相同的线性运算,因此可以在某些假设下进行此更改(例如,参见此处)。
正如您已经正确识别的那样,原始预训练的全连接层的权重被保存为"扁平化"w.r.t caffe 期望卷积层的形状。

我认为可以使用 share_mode: PERMISSIVE:

来解决这个问题

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
layer {
  name:"fc4"
  type:"Convolution"
  bottom:"conv3"
  top:"fc4"
  convolution_param {
    num_output: 512
    pad: 0
    kernel_size: 4
  }
  param {
    lr_mult: 1
    decay_mult: 1
    share_mode: PERMISSIVE  # should help caffe overcome the shape mismatch
  }
  param {
    lr_mult: 2
    decay_mult: 0
    share_mode: PERMISSIVE
  }
}