黄金时代 —— Pytorch学习记录（二）

文章目录

图片 - TorchVision

定义模型

定义模型的4种方法
代码
注意可视化 - netron
想替换backbone - error

ImageNet 网络微调

辅助函数

模型训练和验证
冻结层 requires_grad

初始化和重塑网络

Alexnet
VGG
Squeezenet 1.0
Resnet
Densenet
Inception V3

数据加载
创建优化器
运行训练和验证
代码

STN 2015

基础
STN网络结构
加载数据
定义模型
可视化结果
训练模型

风格迁移

准备
损失函数
预训练模型
开始训练
代码

对抗性实例生成

威胁模型
快速梯度符号攻击 FGSM
代码 `TODO`

生成对抗网络
DCGAN

MORE `TODO`

并行/分布式训练
Not Only Pytorch

模型的生产部署 >> C++
扩展Pytorch
使用PyTorch C ++前端

注解

Autograd
Broadcast
CPU线程/TorchScript推理
CUDA语义
Pytorch的自定义模块
大规模部署
并行处理
完全可重现的堵塞
序列化的相关语义S/L
问题

图片 - TorchVision

Penn-Fudan数据库中对行人检测和分割的预训练Mask R-CNN模型进行微调。
07年的数据集，170个图像，包含345个行人实例
我们将用它来说明如何在torchvision中使用新功能，以便在自定义数据集上训练实例细分模型
继承torch.utils.data.Dataset，实现__len__和__getitem__
- get item：需要返回：
  - 图像：尺寸(H， W）的PIL图像
  - 目标：包含以下字段的一个字典
    - 盒 (FloatTensor [N， 4]）：的N 的坐标在包围盒[X0， Y0， X 1， Y1]格式中，范围从0至W和0至H
    - 标签 (Int64Tensor [N]）：对于每个边界框的标签
    - image_id (Int64Tensor [1]）：图像标识符。它应该是在数据集中的所有图像之间唯一的，评估过程中使用
    - 面积 (张量[N]）：将边界框的面积。这是通过COCO度量评估过程中使用，以分离小，中，大箱之间的度量得分。
    - iscrowd (UInt8Tensor [N]）：用iscrowd =True，将被评估期间忽略。
    - (可选地）掩模 (UInt8Tensor [N， H， W]）：本分割掩码的每个其中一个对象
    - (可选地）关键点 (FloatTensor [N， K， 3]）：对于每一个中的所述一个N个对象，它包含K个关键点[X， Y，能见度]格式中，定义的对象。能见度= 0表示所述关键点是不可见的。请注意，数据增强，翻转关键点的概念是依赖于数据表示，你可能要适应引用/检测/ transforms.py为您的新关键点表示
- 如果要在训练过程中使用长宽比分组(以便每个批次仅包含长宽比相似的图像），则建议您还实现一种get_height_and_width 方法，该方法可返回图像的高度和宽度
- 如果未提供此方法，我们将通过查询数据集的所有元素__getitem__，这会将图像加载到内存中，并且比提供自定义方法要慢

定义模型

在这里插入图片描述

定义模型的4种方法

参考

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77

# Method 1
class Net1(torch.nn.Module):
def __init__(self):
super(Net1, self).__init__()
self.conv1 = torch.nn.Conv2d(3, 32, 3, 1, 1)
self.dense1 = torch.nn.Linear(32 * 3 * 3, 128)
self.dense2 = torch.nn.Linear(128, 10)

def forward(self, x):
x = F.max_pool2d(F.relu(self.conv(x)), 2)
x = x.view(x.size(0), -1)
x = F.relu(self.dense1(x))
x = self.dense2(x)
return x
# Method 2
class Net2(torch.nn.Module):
def __init__(self):
super(Net2, self).__init__()
self.conv = torch.nn.Sequential(
torch.nn.Conv2d(3, 32, 3, 1, 1),
torch.nn.ReLU(),
torch.nn.MaxPool2d(2))
self.dense = torch.nn.Sequential(
torch.nn.Linear(32 * 3 * 3, 128),
torch.nn.ReLU(),
torch.nn.Linear(128, 10)
)

def forward(self, x):
conv_out = self.conv1(x)
res = conv_out.view(conv_out.size(0), -1)
out = self.dense(res)
return out
# Method 3
class Net3(torch.nn.Module):
def __init__(self):
super(Net3, self).__init__()
self.conv=torch.nn.Sequential()
self.conv.add_module("conv1",torch.nn.Conv2d(3, 32, 3, 1, 1))
self.conv.add_module("relu1",torch.nn.ReLU())
self.conv.add_module("pool1",torch.nn.MaxPool2d(2))
self.dense = torch.nn.Sequential()
self.dense.add_module("dense1",torch.nn.Linear(32 * 3 * 3, 128))
self.dense.add_module("relu2",torch.nn.ReLU())
self.dense.add_module("dense2",torch.nn.Linear(128, 10))

def forward(self, x):
conv_out = self.conv1(x)
res = conv_out.view(conv_out.size(0), -1)
out = self.dense(res)
return out
# Method 4
class Net4(torch.nn.Module):
def __init__(self):
super(Net4, self).__init__()
self.conv = torch.nn.Sequential(
OrderedDict(
[
("conv1", torch.nn.Conv2d(3, 32, 3, 1, 1)),
("relu1", torch.nn.ReLU()),
("pool", torch.nn.MaxPool2d(2))
]
))

self.dense = torch.nn.Sequential(
OrderedDict([
("dense1", torch.nn.Linear(32 * 3 * 3, 128)),
("relu2", torch.nn.ReLU()),
("dense2", torch.nn.Linear(128, 10))
])
)

def forward(self, x):
conv_out = self.conv1(x)
res = conv_out.view(conv_out.size(0), -1)
out = self.dense(res)
return out

代码

Mask-R-CNN-Fine-tune

注意可视化 - netron

不知道为什么 torch.save(model, ‘xx.pth’)会报错 (Mask RCNN)

1
2
3
4

# vis_model
example = torch.rand(1, 3, 480, 640)
torch_out = torch.onnx.export(model, example, "test.onnx")
netron.start("test.onnx")

想替换backbone - error

想将MaskRCNN主干网络替换为MobileNetV2

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18

def get_model_instance_segmentation2(num_classes):
# Backbone
backbone = tv.models.mobilenet_v2(pretrained=True).features
backbone.out_channels = 1280 # FasterRCNN需要backbone的输出通道
anchor_gen = AnchorGenerator(sizes=((32,64,128,256,512),),
aspect_ratios=((0.5,1.0,2.0),))
roi_pooler = tv.ops.MultiScaleRoIAlign(featmap_names=['0'],
output_size=7, sampling_ratio=2)
mask_roi_pooler = tv.ops.MultiScaleRoIAlign(featmap_names=['0'],
output_size=14, sampling_ratio=2)
model = MaskRCNN(backbone, num_classes=num_classes,
rpn_anchor_generator=anchor_gen, box_roi_pool=roi_pooler,
mask_roi_pool=mask_roi_pooler)

# model = FasterRCNN(backbone, num_classes=num_classes,
# rpn_anchor_generator=anchor_gen, box_roi_pool=roi_pooler)

return model

报错 … 不知道为什么！

ImageNet 网络微调

1
2
3
4

- 初始化预训练模型
- 重塑最终图层，使其输出数量与新数据集中的类数相同
- 为优化算法定义我们要在训练期间更新哪些参数
- 运行训练步骤

辅助函数

模型训练和验证

冻结层 requires_grad

初始化和重塑网络

ImageNet，1000个类
只想更新我们要重塑的层的参数
inception_v3 要求输入大小为(299,299)，而所有其他模型都期望为(224,224)

Alexnet

1
2
3
4
5

(classifier): Sequential(
...
(6): Linear(in_features=4096,
out_features=1000, bias=True)
)

model.classifier[6] = nn.Linear(4096,num_classes)

VGG

1
2
3
4

(classifier): Sequential(
...
(6): Linear(in_features=4096, out_features=1000, bias=True)
)

model.classifier[6] = nn.Linear(4096,num_classes)

Squeezenet 1.0

1
2
3
4
5
6

(classifier): Sequential(
(0): Dropout(p=0.5)
(1): Conv2d(512, 1000, kernel_size=(1, 1), stride=(1, 1))
(2): ReLU(inplace)
(3): AvgPool2d(kernel_size=13, stride=1, padding=0)
)

model.classifier[1] = nn.Conv2d(512, num_classes, kernel_size=(1,1), stride=(1,1))

Resnet

Resnet18，Resnet34，Resnet50，Resnet101 和 Resnet152
(fc): Linear(in_features=512, out_features=1000, bias=True)
model.fc = nn.Linear(512, num_classes)

Densenet

四个变体
但这里我们仅使用Densenet-121。输出层是具有1024个输入要素的线性层：
(classifier): Linear(in_features=1024, out_features=1000, bias=True)
model.classifier = nn.Linear(1024, num_classes)

Inception V3

在训练时它具有两个输出层。第二个输出称为辅助输出，包含在网络的AuxLogits部分中。主要输出是网络末端的线性层。注意，在测试时，我们仅考虑主要输出。加载模型的辅助输出和主要输出打印为：

1
2
3
4
5
6

(AuxLogits): InceptionAux(
...
(fc): Linear(in_features=768, out_features=1000, bias=True)
)
...
(fc): Linear(in_features=2048, out_features=1000, bias=True)

model.AuxLogits.fc = nn.Linear(768, num_classes)
model.fc = nn.Linear(2048, num_classes)

数据加载

hymenoptera_data

创建优化器

运行训练和验证

代码

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247

from __future__ import print_function
from __future__ import division
import torch, torchvision
import torch.nn as nn
import torch.optim as optim
import numpy as np
import matplotlib.pyplot as plt
from torchvision import datasets, models, transforms
import time, os, copy

def train_model(model, dataloaders, criterion, optimizer, num_epochs=25, is_inception=False):
since = time.time()
val_acc_history = []
best_model_wts = copy.deepcopy(model.state_dict())
best_acc = 0.0
for epoch in range(num_epochs):
print('Epoch {} / {}'.format(epoch, num_epochs - 1))
print('-' * 10)
for phase in ['train', 'val']:
if phase == 'train':
model.train()
else:
model.eval()
running_loss = 0.0
running_corrects = 0
for inputs, labels in dataloaders[phase]:
inputs = inputs.to(device)
labels = labels.to(device)

# zero the parameter gradients
optimizer.zero_grad()

# forward
# track history if only in train
with torch.set_grad_enabled(phase == 'train'):
# Get model outputs and calculate loss
# Special case for inception because in training it has an auxiliary output. In train
# mode we calculate the loss by summing the final output and the auxiliary output
# but in testing we only consider the final output.
if is_inception and phase == 'train':
# From https://discuss.pytorch.org/t/how-to-optimize-inception-model-with-auxiliary-classifiers/7958
outputs, aux_outputs = model(inputs)
loss1 = criterion(outputs, labels)
loss2 = criterion(aux_outputs, labels)
loss = loss1 + 0.4*loss2
else:
outputs = model(inputs)
loss = criterion(outputs, labels)

_, preds = torch.max(outputs, 1)

# backward + optimize only if in training phase
if phase == 'train':
loss.backward()
optimizer.step()

# statistics
running_loss += loss.item() * inputs.size(0)
running_corrects += torch.sum(preds == labels.data)

epoch_loss = running_loss / len(dataloaders[phase].dataset)
epoch_acc = running_corrects.double() / len(dataloaders[phase].dataset)

print('{} Loss: {:.4f} Acc: {:.4f}'.format(phase, epoch_loss, epoch_acc))

# deep copy the model
if phase == 'val' and epoch_acc > best_acc:
best_acc = epoch_acc
best_model_wts = copy.deepcopy(model.state_dict())
if phase == 'val':
val_acc_history.append(epoch_acc)
print()

time_elapsed = time.time() - since
print('Training complete in {:.0f}m {:.0f}s'.format(time_elapsed // 60, time_elapsed % 60))
print('Best val Acc: {:4f}'.format(best_acc))

# load best model weights
model.load_state_dict(best_model_wts)
return model, val_acc_history

def set_parameter_requires_grad(model, feature_extracting):
if feature_extracting:
for param in model.parameters():
param.requires_grad = False

def initialize_model(model_name, num_classes, feature_extract, use_pretrained=True):
# Initialize these variables which will be set in this if statement. Each of these
# variables is model specific.
model_ft = None
input_size = 0

if model_name == "resnet":
""" Resnet18
"""
model_ft = models.resnet18(pretrained=use_pretrained)
set_parameter_requires_grad(model_ft, feature_extract)
num_ftrs = model_ft.fc.in_features
model_ft.fc = nn.Linear(num_ftrs, num_classes)
input_size = 224

elif model_name == "alexnet":
""" Alexnet
"""
model_ft = models.alexnet(pretrained=use_pretrained)
set_parameter_requires_grad(model_ft, feature_extract)
num_ftrs = model_ft.classifier[6].in_features
model_ft.classifier[6] = nn.Linear(num_ftrs,num_classes)
input_size = 224

elif model_name == "vgg":
""" VGG11_bn
"""
model_ft = models.vgg11_bn(pretrained=use_pretrained)
set_parameter_requires_grad(model_ft, feature_extract)
num_ftrs = model_ft.classifier[6].in_features
model_ft.classifier[6] = nn.Linear(num_ftrs,num_classes)
input_size = 224

elif model_name == "squeezenet":
""" Squeezenet
"""
model_ft = models.squeezenet1_0(pretrained=use_pretrained)
set_parameter_requires_grad(model_ft, feature_extract)
model_ft.classifier[1] = nn.Conv2d(512, num_classes, kernel_size=(1,1), stride=(1,1))
model_ft.num_classes = num_classes
input_size = 224

elif model_name == "densenet":
""" Densenet
"""
model_ft = models.densenet121(pretrained=use_pretrained)
set_parameter_requires_grad(model_ft, feature_extract)
num_ftrs = model_ft.classifier.in_features
model_ft.classifier = nn.Linear(num_ftrs, num_classes)
input_size = 224

elif model_name == "inception":
""" Inception v3
Be careful, expects (299,299) sized images and has auxiliary output
"""
model_ft = models.inception_v3(pretrained=use_pretrained)
set_parameter_requires_grad(model_ft, feature_extract)
# Handle the auxilary net
num_ftrs = model_ft.AuxLogits.fc.in_features
model_ft.AuxLogits.fc = nn.Linear(num_ftrs, num_classes)
# Handle the primary net
num_ftrs = model_ft.fc.in_features
model_ft.fc = nn.Linear(num_ftrs,num_classes)
input_size = 299

else:
print("Invalid model name, exiting...")
exit()

return model_ft, input_size

if __name__ == '__main__':
data_dir = './Dataset/hymenoptera_data'
model_name = 'squeezenet'
num_classes = 2
batch_size = 8
num_epochs = 15
feature_extract = True
# Detect if we have a GPU available
device = torch.device("cuda:1" if torch.cuda.is_available() else "cpu")
# Initialize the model for this run
model_ft, input_size = initialize_model(model_name, num_classes,
feature_extract, use_pretrained=True)
# Send the model to GPU
model_ft = model_ft.to(device)
# Data
data_transforms = {
'train': transforms.Compose([
transforms.RandomResizedCrop(input_size),
transforms.RandomHorizontalFlip(),
transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
]),
'val': transforms.Compose([
transforms.Resize(input_size),
transforms.CenterCrop(input_size),
transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
]),
}
print("Initializing Datasets and Dataloaders...")
# Create training and validation datasets
image_datasets = {x: datasets.ImageFolder(os.path.join(data_dir, x),
data_transforms[x]) for x in ['train', 'val']}
# Create training and validation dataloaders
dataloaders_dict = {x: torch.utils.data.DataLoader(image_datasets[x],
batch_size=batch_size, shuffle=True, num_workers=4) for x in ['train', 'val']}
# Gather the parameters to be optimized/updated in this run.
params_to_update = model_ft.parameters()
print("Params to learn:")
if feature_extract:
params_to_update = []
for name,param in model_ft.named_parameters():
if param.requires_grad == True:
params_to_update.append(param)
print("\t",name)
else:
for name,param in model_ft.named_parameters():
if param.requires_grad == True:
print("\t",name)

# Observe that all parameters are being optimized
optimizer_ft = optim.SGD(params_to_update, lr=0.001, momentum=0.9)

# Setup the loss fxn
criterion = nn.CrossEntropyLoss()

# Train and evaluate
model_ft, hist = train_model(model_ft, dataloaders_dict, criterion,
optimizer_ft, num_epochs=num_epochs, is_inception=(model_name=="inception"))

# Initialize the non-pretrained version of the model used for this run
scratch_model,_ = initialize_model(model_name, num_classes, feature_extract=False, use_pretrained=False)
scratch_model = scratch_model.to(device)
scratch_optimizer = optim.SGD(scratch_model.parameters(), lr=0.001, momentum=0.9)
scratch_criterion = nn.CrossEntropyLoss()
_,scratch_hist = train_model(scratch_model, dataloaders_dict, scratch_criterion, scratch_optimizer, num_epochs=num_epochs, is_inception=(model_name=="inception"))

# Plot the training curves of validation accuracy vs. number
# of training epochs for the transfer learning method and
# the model trained from scratch
ohist = []
shist = []

ohist = [h.cpu().numpy() for h in hist]
shist = [h.cpu().numpy() for h in scratch_hist]

plt.title("Validation Accuracy vs. Number of Training Epochs")
plt.xlabel("Training Epochs")
plt.ylabel("Validation Accuracy")
plt.plot(range(1,num_epochs+1),ohist,label="Pretrained")
plt.plot(range(1,num_epochs+1),shist,label="Scratch")
plt.ylim((0,1.))
plt.xticks(np.arange(1, num_epochs+1, 1.0))
plt.legend()
plt.show()

plt.savefig('Compare.png')

STN 2015

(spatial transform network)：学习如何使用称为空间变换器网络的视觉注意力机制来扩充网络
利用Attention机制进行端到端训练的网络设计思想！！！

基础

2D仿射变换Affine
3D透射变换projection

STN网络结构

Localisation Network-局部网络：该网络就是一个简单的回归网络。将输入的图片进行几个卷积操作，然后全连接回归出6个角度值（假设是仿射变换），2*3的矩阵。永远不会从该数据集中显式学习变换，而是网络会自动学习增强全局精度的空间变换。特征图 > 变换矩阵
Parameterised Sampling Grid-参数化网格采样：网格生成器负责将V中的坐标位置，通过矩阵运算，计算出目标图V中的每个位置对应原图U中的坐标位置。即生成T(G)。
Differentiable Image Sampling-差分图像采样：采样器根据T(G)中的坐标信息，在原始图U中进行采样，将U中的像素复制到目标图V中。利用期望的插值方式来计算出对应点的灰度值。

加载数据

MNIST 28x28

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24

import torch, torchvision
import numpy as np
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
import matplotlib.pyplot as plt
import torchvision.datasets as datasets
import torchvision.transforms as transforms

device = torch.device("cuda:1" if torch.cuda.is_available() else "cpu")
# Data
# Training dataset
train_loader = torch.utils.data.DataLoader(
datasets.MNIST(root='.', train=True, download=True,
transform=transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.1307,), (0.3081,))
])), batch_size=64, shuffle=True, num_workers=4)
# Test dataset
test_loader = torch.utils.data.DataLoader(
datasets.MNIST(root='.', train=False, transform=transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.1307,), (0.3081,))
])), batch_size=64, shuffle=True, num_workers=4)

定义模型

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52

class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv1 = nn.Conv2d(1, 10, kernel_size=5)
self.conv2 = nn.Conv2d(10, 20, kernel_size=5)
self.conv2_drop = nn.Dropout2d() # 0.5
self.fc1 = nn.Linear(320, 50)
self.fc2 = nn.Linear(50, 10)

# Spatial transformer localization-network
self.localization = nn.Sequential(
nn.Conv2d(1, 8, kernel_size=7), # 22
nn.MaxPool2d(2, stride=2), # 11
nn.ReLU(True), #
nn.Conv2d(8, 10, kernel_size=5),# 7
nn.MaxPool2d(2, stride=2), # 3
nn.ReLU(True)
)

# Regressor for the 3 * 2 affine matrix
self.fc_loc = nn.Sequential(
nn.Linear(10 * 3 * 3, 32),
nn.ReLU(True),
nn.Linear(32, 3 * 2)
)

# Initialize the weights/bias with identity transformation
self.fc_loc[2].weight.data.zero_()
self.fc_loc[2].bias.data.copy_(torch.tensor([1, 0, 0, 0, 1, 0], dtype=torch.float))

# Spatial transformer network forward function
def stn(self, x):
xs = self.localization(x)
xs = xs.view(-1, 10 * 3 * 3)
theta = self.fc_loc(xs)
theta = theta.view(-1, 2, 3)
grid = F.affine_grid(theta, x.size()) # theta 28x28
x = F.grid_sample(x, grid) # resample
return x

def forward(self, x):
# transform the input
x = self.stn(x)
# Perform the usual forward pass
# conv-pool-conv-drop-pool-fc-relu-drop-fc
x = F.relu(F.max_pool2d(self.conv1(x), 2)) # 24 -> 12
x = F.relu(F.max_pool2d(self.conv2_drop(self.conv2(x)), 2)) # 8 -> 4
x = x.view(-1, 320) # 4x4x20
x = F.relu(self.fc1(x))
x = F.dropout(x, training=self.training)
x = self.fc2(x)
return F.log_softmax(x, dim=1)

可视化结果

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30

def convert_image_np(inp):
"""Convert a Tensor to numpy image."""
inp = inp.numpy().transpose((1, 2, 0))
mean = np.array([0.485, 0.456, 0.406])
std = np.array([0.229, 0.224, 0.225])
inp = std * inp + mean
inp = np.clip(inp, 0, 1)
return inp

def visualize_stn():
with torch.no_grad():
# Get a batch of training data
data = next(iter(test_loader))[0].to(device)

input_tensor = data.cpu()
transformed_input_tensor = model.stn(data).cpu()

in_grid = convert_image_np(
torchvision.utils.make_grid(input_tensor))

out_grid = convert_image_np(
torchvision.utils.make_grid(transformed_input_tensor))

# Plot the results side-by-side
f, axarr = plt.subplots(1, 2)
axarr[0].imshow(in_grid)
axarr[0].set_title('Dataset Images')

axarr[1].imshow(out_grid)
axarr[1].set_title('Transformed Images')

训练模型

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42

def train(epoch):
model.train()
for batch_idx, (data, target) in enumerate(train_loader):
data, target = data.to(device), target.to(device)
optimizer.zero_grad()
output = model(data)
loss = F.nll_loss(output, target)
loss.backward()
optimizer.step()
if batch_idx % 500 == 0:
print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
epoch, batch_idx * len(data), len(train_loader.dataset),
100. * batch_idx / len(train_loader), loss.item()))

def test():
with torch.no_grad():
model.eval()
test_loss = 0
correct = 0
for data, target in test_loader:
data, target = data.to(device), target.to(device)
output = model(data)
# sum up batch loss
test_loss += F.nll_loss(output, target, size_average=False).item()
# get the index of the max log-probability
pred = output.max(1, keepdim=True)[1]
correct += pred.eq(target.view_as(pred)).sum().item()
test_loss /= len(test_loader.dataset)
print('\nTest set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'
.format(test_loss, correct, len(test_loader.dataset),
100. * correct / len(test_loader.dataset)))

# Model + Optimizer
model = Net().to(device)
optimizer = optim.SGD(model.parameters(), lr=0.01)
for epoch in range(1, 20+1):
train(epoch)
test()
visualize_stn()
plt.ioff()
plt.show()
plt.savefig('stn_vis.png')

在这里插入图片描述

风格迁移

在这里插入图片描述

定义了两个距离，一个用于内容和一种样式。测量两个图像之间的内容有多不同，而测量两个图像之间样式的差异。然后，我们获取第三个图像输入，并将其转换为最小化与内容图像的内容距离和与样式图像的样式距离。现在我们可以导入必要的包并开始神经传递。

准备

损失函数

内容损失 + 风格损失
内容损失：内容损失是代表单个图层内容距离的加权版本的函数。
风格损失：

预训练模型

PyTorch的VGG实现是一个模块，分为两个子 Sequential模块：(features包含卷积和池化层）和classifier(包含完全连接的层）。
我们将使用该features模块，因为我们需要各个卷积层的输出来测量内容和样式损失。某些层在训练期间的行为与评估不同，因此我们必须使用将网络设置为评估模式.eval()

开始训练

作者Leon Gatys在此处建议的那样，我们将使用L-BFGS算法来运行我们的梯度下降。与训练网络不同，我们希望训练输入图像以最大程度地减少内容/样式损失

代码

from __future__ import print_function

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

from PIL import Image
import matplotlib.pyplot as plt

import torchvision.transforms as transforms
import torchvision.models as models

import copy

def image_loader(image_name):
image = Image.open(image_name)
# fake batch dimension required to fit network's input dimensions
image = loader(image).unsqueeze(0)
return image.to(device, torch.float)

unloader = transforms.ToPILImage() # reconvert into PIL image

def imshow(tensor, title=None):
image = tensor.cpu().clone() # we clone the tensor to not do changes on it
image = image.squeeze(0) # remove the fake batch dimension
image = unloader(image)
plt.imshow(image)
if title is not None:
plt.title(title)
plt.pause(0.001) # pause a bit so that plots are updated

class ContentLoss(nn.Module):
# 尽管此模块名为ContentLoss，但它不是真正的PyTorch损失函数。
# 如果要将内容损失定义为PyTorch损失函数，
# 则必须创建一个PyTorch autograd函数以在backward方法中手动重新计算/实现梯度。
def __init__(self, target,):
super(ContentLoss, self).__init__()
# we 'detach' the target content from the tree used
# to dynamically compute the gradient: this is a stated value,
# not a variable. Otherwise the forward method of the criterion
# will throw an error.
self.target = target.detach()

def forward(self, input):
self.loss = F.mse_loss(input, self.target)
return input

def gram_matrix(input):
a, b, c, d = input.size() # a=batch size(=1)
# b=number of feature maps
# (c,d)=dimensions of a f. map (N=c*d)
features = input.view(a * b, c * d) # resise F_XL into \hat F_XL
G = torch.mm(features, features.t()) # compute the gram product
# we 'normalize' the values of the gram matrix
# by dividing by the number of element in each feature maps.
return G.div(a * b * c * d)

class StyleLoss(nn.Module):

def __init__(self, target_feature):
super(StyleLoss, self).__init__()
self.target = gram_matrix(target_feature).detach()

def forward(self, input):
G = gram_matrix(input)
self.loss = F.mse_loss(G, self.target)
return input

# create a module to normalize input image so we can easily put it in a
# nn.Sequential
class Normalization(nn.Module):
def __init__(self, mean, std):
super(Normalization, self).__init__()
# .view the mean and std to make them [C x 1 x 1] so that they can
# directly work with image Tensor of shape [B x C x H x W].
# B is batch size. C is number of channels. H is height and W is width.
self.mean = torch.tensor(mean).view(-1, 1, 1)
self.std = torch.tensor(std).view(-1, 1, 1)

def forward(self, img):
# normalize img
return (img - self.mean) / self.std

# desired depth layers to compute style/content losses :
content_layers_default = ['conv_4']
style_layers_default = ['conv_1', 'conv_2', 'conv_3', 'conv_4', 'conv_5']

def get_style_model_and_losses(cnn, normalization_mean, normalization_std,
style_img, content_img,
content_layers=content_layers_default,
style_layers=style_layers_default):
cnn = copy.deepcopy(cnn)

# normalization module
normalization = Normalization(normalization_mean, normalization_std).to(device)

# just in order to have an iterable access to or list of content/syle
# losses
content_losses = []
style_losses = []

# assuming that cnn is a nn.Sequential, so we make a new nn.Sequential
# to put in modules that are supposed to be activated sequentially
model = nn.Sequential(normalization)

i = 0 # increment every time we see a conv
for layer in cnn.children():
if isinstance(layer, nn.Conv2d):
i += 1
name = 'conv_{}'.format(i)
elif isinstance(layer, nn.ReLU):
name = 'relu_{}'.format(i)
# The in-place version doesn't play very nicely with the ContentLoss
# and StyleLoss we insert below. So we replace with out-of-place
# ones here.
layer = nn.ReLU(inplace=False)
elif isinstance(layer, nn.MaxPool2d):
name = 'pool_{}'.format(i)
elif isinstance(layer, nn.BatchNorm2d):
name = 'bn_{}'.format(i)
else:
raise RuntimeError('Unrecognized layer: {}'.format(layer.__class__.__name__))

model.add_module(name, layer)

if name in content_layers:
# add content loss:
target = model(content_img).detach()
content_loss = ContentLoss(target)
model.add_module("content_loss_{}".format(i), content_loss)
content_losses.append(content_loss)

if name in style_layers:
# add style loss:
target_feature = model(style_img).detach()
style_loss = StyleLoss(target_feature)
model.add_module("style_loss_{}".format(i), style_loss)
style_losses.append(style_loss)

# now we trim off the layers after the last content and style losses
for i in range(len(model) - 1, -1, -1):
if isinstance(model[i], ContentLoss) or isinstance(model[i], StyleLoss):
break
model = model[:(i + 1)]
return model, style_losses, content_losses

def get_input_optimizer(input_img):
# this line to show that input is a parameter that requires a gradient
optimizer = optim.LBFGS([input_img.requires_grad_()])
return optimizer

def run_style_transfer(cnn, normalization_mean, normalization_std,
content_img, style_img, input_img, num_steps=300,
style_weight=1000000, content_weight=1):
"""Run the style transfer."""
print('Building the style transfer model..')
model, style_losses, content_losses = get_style_model_and_losses(cnn,
normalization_mean, normalization_std, style_img, content_img)
optimizer = get_input_optimizer(input_img)

print('Optimizing..')
run = [0]
while run[0] <= num_steps:

def closure():
# correct the values of updated input image
input_img.data.clamp_(0, 1)

optimizer.zero_grad()
model(input_img)
style_score = 0
content_score = 0

for sl in style_losses:
style_score += sl.loss
for cl in content_losses:
content_score += cl.loss

style_score *= style_weight
content_score *= content_weight

loss = style_score + content_score
loss.backward() # loss得bp!!!

run[0] += 1
if run[0] % 50 == 0:
print("run {}:".format(run))
print('Style Loss : {:4f} Content Loss: {:4f}'.format(
style_score.item(), content_score.item()))
print()

return style_score + content_score

optimizer.step(closure)

# a last correction...
input_img.data.clamp_(0, 1)

return input_img

if __name__ == '__main__':
#1 Prepare
device = torch.device('cuda:1' if torch.cuda.is_available() else 'cpu')
# desired size of the output image
imsize = 512 if torch.cuda.is_available() else 128 # use small size if no gpu
loader = transforms.Compose([
transforms.Resize(imsize), # scale imported image
transforms.ToTensor()]) # transform it into a torch tensor
style_img = image_loader("./Dataset/neural-style/picasso.jpg")
content_img = image_loader("./Dataset/neural-style/dancing.jpg")
assert style_img.size() == content_img.size(), \
"we need to import style and content images of the same size"
#2 Model
cnn = models.vgg19(pretrained=True).features.to(device).eval()
cnn_normalization_mean = torch.tensor([0.485, 0.456, 0.406]).to(device)
cnn_normalization_std = torch.tensor([0.229, 0.224, 0.225]).to(device)
#3 Input: 使用内容图像或白噪声的副本
input_img = content_img.clone()
# input_img = torch.randn(content_img.data.size(), device=device)
#4 Train
output = run_style_transfer(cnn, cnn_normalization_mean, cnn_normalization_std,
content_img, style_img, input_img)

plt.figure()
imshow(output, title='Output Image')
plt.ioff()
plt.show()
plt.savefig('transferImg.png')

在这里插入图片描述

对抗性实例生成

设计和训练模型的一个经常被忽略的方面是安全性和鲁棒性，尤其是在面对想要欺骗模型的对手的情况下。
提高您对ML模型的安全漏洞的认识，并深入了解对抗性机器学习的热门话题。
将通过图像分类器上的示例来探讨该主题。具体来说，我们将使用第一种也是最流行的攻击方法之一，即快速梯度符号攻击(FGSM） 来欺骗MNIST分类器

威胁模型

就上下文而言，有许多类别的对抗性攻击，每种攻击者都有不同的目标和对攻击者知识的假设。
总体目标是向输入数据添加最少的扰动，以引起所需的错误分类。
攻击假设两种是：white-box和black-box
- white-box攻击假设攻击者有充分的知识和访问模型，包括结构，输入，输出，和权重！
- black-box攻击假设攻击者只能访问输入和模型的输出，并且一无所知底层架构或权重
攻击目标：错误分类和源/目标错误分类 ① 错误分类的目标：对手只希望输出分类错误，而不关心新分类是什么。② 源/目标误分类：对手想要改变图像是特定源类的最初使得其被归类为特定的目标类
FGSM攻击是white-box攻击，目的是进行错误分类

快速梯度符号攻击 FGSM

它旨在利用神经网络的学习方式，梯度来攻击神经网络
不是根据反向传播的梯度通过调整权重来使损失最小化，而是根据相同的反向传播的梯度来调整输入数据以使损失最大化。
即：攻击使用输入数据的损失梯度，然后调整输入数据以使损失最大化

代码 TODO

GAN

生成对抗网络

它们由两个不同的模型组成：生成器和判别器
生成器的工作是生成看起来像训练图像的“假”图像
判别器的工作是查看图像并从生成器输出它是真实的训练图像还是伪图像
在训练过程中，生成器不断尝试通过生成越来越好的伪造品而使判别器的性能超过智者，而判别器正在努力成为更好的侦探并正确地对真实和伪造图像进行分类
博弈的平衡点是当生成器生成的伪造品看起来像直接来自训练数据时，而判别器则始终猜测生成器输出是真实还是伪造品的50％置信度
然而，GAN的收敛理论仍在积极研究中，实际上，模型并不总是能达到此目的。

DCGAN

Unsupervised Representation Learning With Deep Convolutional Generative Adversarial Networks
判别器由 Conv 、BN 以及 LeakyReLU 激活层组成。输入是 3x64x64 的图像，输出是输入图像来自实际数据的概率
生成器由 Deconv，BN 以及 ReLU 激活层组成。输入是一个本征向量(latent vector），它是从标准正态分布中采样得到的，输出是一个3x64x64 的RGB图像！

文章目录

定义模型

定义模型的4种方法

代码

注意可视化 - netron

想替换backbone - error

辅助函数

模型训练和验证

冻结层 requires_grad

初始化和重塑网络

Alexnet

VGG

Squeezenet 1.0

Resnet

Densenet

Inception V3

数据加载

创建优化器

运行训练和验证

代码

基础

STN网络结构

加载数据

定义模型

可视化结果

训练模型

准备

损失函数

预训练模型

开始训练

代码

威胁模型

快速梯度符号攻击 FGSM

代码 TODO

生成对抗网络

DCGAN

MORE TODO

模型的生产部署 >> C++

扩展Pytorch

使用PyTorch C ++前端

Autograd

Broadcast

CPU线程/TorchScript推理

CUDA语义

Pytorch的自定义模块

大规模部署

并行处理

完全可重现的堵塞

序列化的相关语义S/L

问题