文章目录
- 图片 - TorchVision
- 定义模型
- 定义模型的4种方法
- 代码
- 注意可视化 - netron
- 想替换backbone - error
- ImageNet 网络微调
- 辅助函数
- 模型训练和验证
- 冻结层 requires_grad
- 初始化和重塑网络
- Alexnet
- VGG
- Squeezenet 1.0
- Resnet
- Densenet
- Inception V3
- 数据加载
- 创建优化器
- 运行训练和验证
- 代码
- STN 2015
- 基础
- STN网络结构
- 加载数据
- 定义模型
- 可视化结果
- 训练模型
- 风格迁移
- 准备
- 损失函数
- 预训练模型
- 开始训练
- 代码
- 对抗性实例生成
- 威胁模型
- 快速梯度符号攻击 FGSM
- 代码 `TODO`
- GAN
- 生成对抗网络
- DCGAN
- MORE `TODO`
- 并行/分布式训练
- Not Only Pytorch
- 模型的生产部署 >> C++
- 扩展Pytorch
- 使用PyTorch C ++前端
- 注解
- Autograd
- Broadcast
- CPU线程/TorchScript推理
- CUDA语义
- Pytorch的自定义模块
- 大规模部署
- 并行处理
- 完全可重现的堵塞
- 序列化的相关语义S/L
- 问题
图片 - TorchVision
- Penn-Fudan数据库中对行人检测和分割的预训练Mask R-CNN模型进行微调。
- 07年的数据集,170个图像,包含345个行人实例
- 我们将用它来说明如何在torchvision中使用新功能,以便在自定义数据集上训练实例细分模型
- 继承
torch.utils.data.Dataset ,实现__len__ 和__getitem__ - get item:需要返回 :
- 图像:尺寸(H, W)的PIL图像
- 目标:包含以下字段的一个字典
- 盒 (FloatTensor [N, 4]):的N 的坐标在包围盒[X0, Y0, X 1, Y1]格式中,范围从0至W和0至H
- 标签 (Int64Tensor [N]):对于每个边界框的标签
- image_id (Int64Tensor [1]):图像标识符。它应该是在数据集中的所有图像之间唯一的,评估过程中使用
- 面积 (张量[N]):将边界框的面积。这是通过COCO度量评估过程中使用,以分离小,中,大箱之间的度量得分。
- iscrowd (UInt8Tensor [N]):用iscrowd =True,将被评估期间忽略。
- (可选地)掩模 (UInt8Tensor [N, H, W]):本分割掩码的每个其中一个对象
- (可选地)关键点 (FloatTensor [N, K, 3]):对于每一个中的所述一个N个对象,它包含K个关键点[X, Y, 能见度]格式中,定义的对象。能见度= 0表示所述关键点是不可见的。请注意,数据增强,翻转关键点的概念是依赖于数据表示,你可能要适应引用/检测/ transforms.py为您的新关键点表示
- 如果要在训练过程中使用长宽比分组(以便每个批次仅包含长宽比相似的图像),则建议您还实现一种get_height_and_width 方法,该方法可返回图像的高度和宽度
- 如果未提供此方法,我们将通过查询数据集的所有元素__getitem__,这会将图像加载到内存中,并且比提供自定义方法要慢
- get item:需要返回 :
定义模型

定义模型的4种方法
参考
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 | # Method 1 class Net1(torch.nn.Module): def __init__(self): super(Net1, self).__init__() self.conv1 = torch.nn.Conv2d(3, 32, 3, 1, 1) self.dense1 = torch.nn.Linear(32 * 3 * 3, 128) self.dense2 = torch.nn.Linear(128, 10) def forward(self, x): x = F.max_pool2d(F.relu(self.conv(x)), 2) x = x.view(x.size(0), -1) x = F.relu(self.dense1(x)) x = self.dense2(x) return x # Method 2 class Net2(torch.nn.Module): def __init__(self): super(Net2, self).__init__() self.conv = torch.nn.Sequential( torch.nn.Conv2d(3, 32, 3, 1, 1), torch.nn.ReLU(), torch.nn.MaxPool2d(2)) self.dense = torch.nn.Sequential( torch.nn.Linear(32 * 3 * 3, 128), torch.nn.ReLU(), torch.nn.Linear(128, 10) ) def forward(self, x): conv_out = self.conv1(x) res = conv_out.view(conv_out.size(0), -1) out = self.dense(res) return out # Method 3 class Net3(torch.nn.Module): def __init__(self): super(Net3, self).__init__() self.conv=torch.nn.Sequential() self.conv.add_module("conv1",torch.nn.Conv2d(3, 32, 3, 1, 1)) self.conv.add_module("relu1",torch.nn.ReLU()) self.conv.add_module("pool1",torch.nn.MaxPool2d(2)) self.dense = torch.nn.Sequential() self.dense.add_module("dense1",torch.nn.Linear(32 * 3 * 3, 128)) self.dense.add_module("relu2",torch.nn.ReLU()) self.dense.add_module("dense2",torch.nn.Linear(128, 10)) def forward(self, x): conv_out = self.conv1(x) res = conv_out.view(conv_out.size(0), -1) out = self.dense(res) return out # Method 4 class Net4(torch.nn.Module): def __init__(self): super(Net4, self).__init__() self.conv = torch.nn.Sequential( OrderedDict( [ ("conv1", torch.nn.Conv2d(3, 32, 3, 1, 1)), ("relu1", torch.nn.ReLU()), ("pool", torch.nn.MaxPool2d(2)) ] )) self.dense = torch.nn.Sequential( OrderedDict([ ("dense1", torch.nn.Linear(32 * 3 * 3, 128)), ("relu2", torch.nn.ReLU()), ("dense2", torch.nn.Linear(128, 10)) ]) ) def forward(self, x): conv_out = self.conv1(x) res = conv_out.view(conv_out.size(0), -1) out = self.dense(res) return out |
代码
Mask-R-CNN-Fine-tune
注意可视化 - netron
- 不知道为什么 torch.save(model, ‘xx.pth’)会报错 (Mask RCNN)
1 2 3 4 | # vis_model example = torch.rand(1, 3, 480, 640) torch_out = torch.onnx.export(model, example, "test.onnx") netron.start("test.onnx") |
想替换backbone - error
- 想将MaskRCNN主干网络替换为MobileNetV2
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | def get_model_instance_segmentation2(num_classes): # Backbone backbone = tv.models.mobilenet_v2(pretrained=True).features backbone.out_channels = 1280 # FasterRCNN需要backbone的输出通道 anchor_gen = AnchorGenerator(sizes=((32,64,128,256,512),), aspect_ratios=((0.5,1.0,2.0),)) roi_pooler = tv.ops.MultiScaleRoIAlign(featmap_names=['0'], output_size=7, sampling_ratio=2) mask_roi_pooler = tv.ops.MultiScaleRoIAlign(featmap_names=['0'], output_size=14, sampling_ratio=2) model = MaskRCNN(backbone, num_classes=num_classes, rpn_anchor_generator=anchor_gen, box_roi_pool=roi_pooler, mask_roi_pool=mask_roi_pooler) # model = FasterRCNN(backbone, num_classes=num_classes, # rpn_anchor_generator=anchor_gen, box_roi_pool=roi_pooler) return model |
- 报错 … 不知道为什么!
ImageNet 网络微调
1 2 3 4 | - 初始化预训练模型 - 重塑最终图层,使其输出数量与新数据集中的类数相同 - 为优化算法定义我们要在训练期间更新哪些参数 - 运行训练步骤 |
辅助函数
模型训练和验证
冻结层 requires_grad
初始化和重塑网络
- ImageNet,1000个类
- 只想更新我们要重塑的层的参数
- inception_v3 要求输入大小为(299,299),而所有其他模型都期望为(224,224)
Alexnet
1 2 3 4 5 | (classifier): Sequential( ... (6): Linear(in_features=4096, out_features=1000, bias=True) ) |
- model.classifier[6] = nn.Linear(4096,num_classes)
VGG
1 2 3 4 | (classifier): Sequential( ... (6): Linear(in_features=4096, out_features=1000, bias=True) ) |
- model.classifier[6] = nn.Linear(4096,num_classes)
Squeezenet 1.0
1 2 3 4 5 6 | (classifier): Sequential( (0): Dropout(p=0.5) (1): Conv2d(512, 1000, kernel_size=(1, 1), stride=(1, 1)) (2): ReLU(inplace) (3): AvgPool2d(kernel_size=13, stride=1, padding=0) ) |
- model.classifier[1] = nn.Conv2d(512, num_classes, kernel_size=(1,1), stride=(1,1))
Resnet
- Resnet18,Resnet34,Resnet50,Resnet101 和 Resnet152
- (fc): Linear(in_features=512, out_features=1000, bias=True)
- model.fc = nn.Linear(512, num_classes)
Densenet
- 四个变体
- 但这里我们仅使用Densenet-121。输出层是具有1024个输入要素的线性层:
- (classifier): Linear(in_features=1024, out_features=1000, bias=True)
- model.classifier = nn.Linear(1024, num_classes)
Inception V3
- 在训练时它具有两个输出层。第二个输出称为辅助输出,包含在网络的AuxLogits部分中。 主要输出是网络末端的线性层。注意,在测试时,我们仅考虑主要输出。 加载模型的辅助输出和主要输出打印为:
1 2 3 4 5 6 | (AuxLogits): InceptionAux( ... (fc): Linear(in_features=768, out_features=1000, bias=True) ) ... (fc): Linear(in_features=2048, out_features=1000, bias=True) |
- model.AuxLogits.fc = nn.Linear(768, num_classes)
- model.fc = nn.Linear(2048, num_classes)
数据加载
- hymenoptera_data
创建优化器
运行训练和验证
代码
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 | from __future__ import print_function from __future__ import division import torch, torchvision import torch.nn as nn import torch.optim as optim import numpy as np import matplotlib.pyplot as plt from torchvision import datasets, models, transforms import time, os, copy def train_model(model, dataloaders, criterion, optimizer, num_epochs=25, is_inception=False): since = time.time() val_acc_history = [] best_model_wts = copy.deepcopy(model.state_dict()) best_acc = 0.0 for epoch in range(num_epochs): print('Epoch {} / {}'.format(epoch, num_epochs - 1)) print('-' * 10) for phase in ['train', 'val']: if phase == 'train': model.train() else: model.eval() running_loss = 0.0 running_corrects = 0 for inputs, labels in dataloaders[phase]: inputs = inputs.to(device) labels = labels.to(device) # zero the parameter gradients optimizer.zero_grad() # forward # track history if only in train with torch.set_grad_enabled(phase == 'train'): # Get model outputs and calculate loss # Special case for inception because in training it has an auxiliary output. In train # mode we calculate the loss by summing the final output and the auxiliary output # but in testing we only consider the final output. if is_inception and phase == 'train': # From https://discuss.pytorch.org/t/how-to-optimize-inception-model-with-auxiliary-classifiers/7958 outputs, aux_outputs = model(inputs) loss1 = criterion(outputs, labels) loss2 = criterion(aux_outputs, labels) loss = loss1 + 0.4*loss2 else: outputs = model(inputs) loss = criterion(outputs, labels) _, preds = torch.max(outputs, 1) # backward + optimize only if in training phase if phase == 'train': loss.backward() optimizer.step() # statistics running_loss += loss.item() * inputs.size(0) running_corrects += torch.sum(preds == labels.data) epoch_loss = running_loss / len(dataloaders[phase].dataset) epoch_acc = running_corrects.double() / len(dataloaders[phase].dataset) print('{} Loss: {:.4f} Acc: {:.4f}'.format(phase, epoch_loss, epoch_acc)) # deep copy the model if phase == 'val' and epoch_acc > best_acc: best_acc = epoch_acc best_model_wts = copy.deepcopy(model.state_dict()) if phase == 'val': val_acc_history.append(epoch_acc) print() time_elapsed = time.time() - since print('Training complete in {:.0f}m {:.0f}s'.format(time_elapsed // 60, time_elapsed % 60)) print('Best val Acc: {:4f}'.format(best_acc)) # load best model weights model.load_state_dict(best_model_wts) return model, val_acc_history def set_parameter_requires_grad(model, feature_extracting): if feature_extracting: for param in model.parameters(): param.requires_grad = False def initialize_model(model_name, num_classes, feature_extract, use_pretrained=True): # Initialize these variables which will be set in this if statement. Each of these # variables is model specific. model_ft = None input_size = 0 if model_name == "resnet": """ Resnet18 """ model_ft = models.resnet18(pretrained=use_pretrained) set_parameter_requires_grad(model_ft, feature_extract) num_ftrs = model_ft.fc.in_features model_ft.fc = nn.Linear(num_ftrs, num_classes) input_size = 224 elif model_name == "alexnet": """ Alexnet """ model_ft = models.alexnet(pretrained=use_pretrained) set_parameter_requires_grad(model_ft, feature_extract) num_ftrs = model_ft.classifier[6].in_features model_ft.classifier[6] = nn.Linear(num_ftrs,num_classes) input_size = 224 elif model_name == "vgg": """ VGG11_bn """ model_ft = models.vgg11_bn(pretrained=use_pretrained) set_parameter_requires_grad(model_ft, feature_extract) num_ftrs = model_ft.classifier[6].in_features model_ft.classifier[6] = nn.Linear(num_ftrs,num_classes) input_size = 224 elif model_name == "squeezenet": """ Squeezenet """ model_ft = models.squeezenet1_0(pretrained=use_pretrained) set_parameter_requires_grad(model_ft, feature_extract) model_ft.classifier[1] = nn.Conv2d(512, num_classes, kernel_size=(1,1), stride=(1,1)) model_ft.num_classes = num_classes input_size = 224 elif model_name == "densenet": """ Densenet """ model_ft = models.densenet121(pretrained=use_pretrained) set_parameter_requires_grad(model_ft, feature_extract) num_ftrs = model_ft.classifier.in_features model_ft.classifier = nn.Linear(num_ftrs, num_classes) input_size = 224 elif model_name == "inception": """ Inception v3 Be careful, expects (299,299) sized images and has auxiliary output """ model_ft = models.inception_v3(pretrained=use_pretrained) set_parameter_requires_grad(model_ft, feature_extract) # Handle the auxilary net num_ftrs = model_ft.AuxLogits.fc.in_features model_ft.AuxLogits.fc = nn.Linear(num_ftrs, num_classes) # Handle the primary net num_ftrs = model_ft.fc.in_features model_ft.fc = nn.Linear(num_ftrs,num_classes) input_size = 299 else: print("Invalid model name, exiting...") exit() return model_ft, input_size if __name__ == '__main__': data_dir = './Dataset/hymenoptera_data' model_name = 'squeezenet' num_classes = 2 batch_size = 8 num_epochs = 15 feature_extract = True # Detect if we have a GPU available device = torch.device("cuda:1" if torch.cuda.is_available() else "cpu") # Initialize the model for this run model_ft, input_size = initialize_model(model_name, num_classes, feature_extract, use_pretrained=True) # Send the model to GPU model_ft = model_ft.to(device) # Data data_transforms = { 'train': transforms.Compose([ transforms.RandomResizedCrop(input_size), transforms.RandomHorizontalFlip(), transforms.ToTensor(), transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]) ]), 'val': transforms.Compose([ transforms.Resize(input_size), transforms.CenterCrop(input_size), transforms.ToTensor(), transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]) ]), } print("Initializing Datasets and Dataloaders...") # Create training and validation datasets image_datasets = {x: datasets.ImageFolder(os.path.join(data_dir, x), data_transforms[x]) for x in ['train', 'val']} # Create training and validation dataloaders dataloaders_dict = {x: torch.utils.data.DataLoader(image_datasets[x], batch_size=batch_size, shuffle=True, num_workers=4) for x in ['train', 'val']} # Gather the parameters to be optimized/updated in this run. params_to_update = model_ft.parameters() print("Params to learn:") if feature_extract: params_to_update = [] for name,param in model_ft.named_parameters(): if param.requires_grad == True: params_to_update.append(param) print("\t",name) else: for name,param in model_ft.named_parameters(): if param.requires_grad == True: print("\t",name) # Observe that all parameters are being optimized optimizer_ft = optim.SGD(params_to_update, lr=0.001, momentum=0.9) # Setup the loss fxn criterion = nn.CrossEntropyLoss() # Train and evaluate model_ft, hist = train_model(model_ft, dataloaders_dict, criterion, optimizer_ft, num_epochs=num_epochs, is_inception=(model_name=="inception")) # Initialize the non-pretrained version of the model used for this run scratch_model,_ = initialize_model(model_name, num_classes, feature_extract=False, use_pretrained=False) scratch_model = scratch_model.to(device) scratch_optimizer = optim.SGD(scratch_model.parameters(), lr=0.001, momentum=0.9) scratch_criterion = nn.CrossEntropyLoss() _,scratch_hist = train_model(scratch_model, dataloaders_dict, scratch_criterion, scratch_optimizer, num_epochs=num_epochs, is_inception=(model_name=="inception")) # Plot the training curves of validation accuracy vs. number # of training epochs for the transfer learning method and # the model trained from scratch ohist = [] shist = [] ohist = [h.cpu().numpy() for h in hist] shist = [h.cpu().numpy() for h in scratch_hist] plt.title("Validation Accuracy vs. Number of Training Epochs") plt.xlabel("Training Epochs") plt.ylabel("Validation Accuracy") plt.plot(range(1,num_epochs+1),ohist,label="Pretrained") plt.plot(range(1,num_epochs+1),shist,label="Scratch") plt.ylim((0,1.)) plt.xticks(np.arange(1, num_epochs+1, 1.0)) plt.legend() plt.show() plt.savefig('Compare.png') |
STN 2015
- (spatial transform network):学习如何使用称为空间变换器网络的视觉注意力机制来扩充网络
- 利用Attention机制进行端到端训练的网络设计思想!!!
基础
- 2D仿射变换Affine

- 3D透射变换projection

STN网络结构
- Localisation Network-局部网络: 该网络就是一个简单的回归网络。将输入的图片进行几个卷积操作,然后全连接回归出6个角度值(假设是仿射变换),2*3的矩阵。永远不会从该数据集中显式学习变换,而是网络会自动学习增强全局精度的空间变换。
特征图 > 变换矩阵 - Parameterised Sampling Grid-参数化网格采样:网格生成器负责将V中的坐标位置,通过矩阵运算,计算出目标图V中的每个位置对应原图U中的坐标位置。即生成T(G)。


- Differentiable Image Sampling-差分图像采样:采样器根据T(G)中的坐标信息,在原始图U中进行采样,将U中的像素复制到目标图V中。利用期望的插值方式来计算出对应点的灰度值。

加载数据
- MNIST 28x28
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 | import torch, torchvision import numpy as np import torch.nn as nn import torch.optim as optim import torch.nn.functional as F import matplotlib.pyplot as plt import torchvision.datasets as datasets import torchvision.transforms as transforms device = torch.device("cuda:1" if torch.cuda.is_available() else "cpu") # Data # Training dataset train_loader = torch.utils.data.DataLoader( datasets.MNIST(root='.', train=True, download=True, transform=transforms.Compose([ transforms.ToTensor(), transforms.Normalize((0.1307,), (0.3081,)) ])), batch_size=64, shuffle=True, num_workers=4) # Test dataset test_loader = torch.utils.data.DataLoader( datasets.MNIST(root='.', train=False, transform=transforms.Compose([ transforms.ToTensor(), transforms.Normalize((0.1307,), (0.3081,)) ])), batch_size=64, shuffle=True, num_workers=4) |
定义模型
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 | class Net(nn.Module): def __init__(self): super(Net, self).__init__() self.conv1 = nn.Conv2d(1, 10, kernel_size=5) self.conv2 = nn.Conv2d(10, 20, kernel_size=5) self.conv2_drop = nn.Dropout2d() # 0.5 self.fc1 = nn.Linear(320, 50) self.fc2 = nn.Linear(50, 10) # Spatial transformer localization-network self.localization = nn.Sequential( nn.Conv2d(1, 8, kernel_size=7), # 22 nn.MaxPool2d(2, stride=2), # 11 nn.ReLU(True), # nn.Conv2d(8, 10, kernel_size=5),# 7 nn.MaxPool2d(2, stride=2), # 3 nn.ReLU(True) ) # Regressor for the 3 * 2 affine matrix self.fc_loc = nn.Sequential( nn.Linear(10 * 3 * 3, 32), nn.ReLU(True), nn.Linear(32, 3 * 2) ) # Initialize the weights/bias with identity transformation self.fc_loc[2].weight.data.zero_() self.fc_loc[2].bias.data.copy_(torch.tensor([1, 0, 0, 0, 1, 0], dtype=torch.float)) # Spatial transformer network forward function def stn(self, x): xs = self.localization(x) xs = xs.view(-1, 10 * 3 * 3) theta = self.fc_loc(xs) theta = theta.view(-1, 2, 3) grid = F.affine_grid(theta, x.size()) # theta 28x28 x = F.grid_sample(x, grid) # resample return x def forward(self, x): # transform the input x = self.stn(x) # Perform the usual forward pass # conv-pool-conv-drop-pool-fc-relu-drop-fc x = F.relu(F.max_pool2d(self.conv1(x), 2)) # 24 -> 12 x = F.relu(F.max_pool2d(self.conv2_drop(self.conv2(x)), 2)) # 8 -> 4 x = x.view(-1, 320) # 4x4x20 x = F.relu(self.fc1(x)) x = F.dropout(x, training=self.training) x = self.fc2(x) return F.log_softmax(x, dim=1) |
可视化结果
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 | def convert_image_np(inp): """Convert a Tensor to numpy image.""" inp = inp.numpy().transpose((1, 2, 0)) mean = np.array([0.485, 0.456, 0.406]) std = np.array([0.229, 0.224, 0.225]) inp = std * inp + mean inp = np.clip(inp, 0, 1) return inp def visualize_stn(): with torch.no_grad(): # Get a batch of training data data = next(iter(test_loader))[0].to(device) input_tensor = data.cpu() transformed_input_tensor = model.stn(data).cpu() in_grid = convert_image_np( torchvision.utils.make_grid(input_tensor)) out_grid = convert_image_np( torchvision.utils.make_grid(transformed_input_tensor)) # Plot the results side-by-side f, axarr = plt.subplots(1, 2) axarr[0].imshow(in_grid) axarr[0].set_title('Dataset Images') axarr[1].imshow(out_grid) axarr[1].set_title('Transformed Images') |
训练模型
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 | def train(epoch): model.train() for batch_idx, (data, target) in enumerate(train_loader): data, target = data.to(device), target.to(device) optimizer.zero_grad() output = model(data) loss = F.nll_loss(output, target) loss.backward() optimizer.step() if batch_idx % 500 == 0: print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format( epoch, batch_idx * len(data), len(train_loader.dataset), 100. * batch_idx / len(train_loader), loss.item())) def test(): with torch.no_grad(): model.eval() test_loss = 0 correct = 0 for data, target in test_loader: data, target = data.to(device), target.to(device) output = model(data) # sum up batch loss test_loss += F.nll_loss(output, target, size_average=False).item() # get the index of the max log-probability pred = output.max(1, keepdim=True)[1] correct += pred.eq(target.view_as(pred)).sum().item() test_loss /= len(test_loader.dataset) print('\nTest set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n' .format(test_loss, correct, len(test_loader.dataset), 100. * correct / len(test_loader.dataset))) # Model + Optimizer model = Net().to(device) optimizer = optim.SGD(model.parameters(), lr=0.01) for epoch in range(1, 20+1): train(epoch) test() visualize_stn() plt.ioff() plt.show() plt.savefig('stn_vis.png') |

风格迁移

- 定义了两个距离,一个用于内容 和一种样式 。 测量两个图像之间的内容有多不同,而 测量两个图像之间样式的差异。然后,我们获取第三个图像输入,并将其转换为最小化与内容图像的内容距离和与样式图像的样式距离。现在我们可以导入必要的包并开始神经传递。
准备
损失函数
- 内容损失 + 风格损失
- 内容损失:内容损失是代表单个图层内容距离的加权版本的函数。
- 风格损失:
预训练模型
- PyTorch的VGG实现是一个模块,分为两个子 Sequential模块:(features包含卷积和池化层)和classifier(包含完全连接的层)。
- 我们将使用该features模块,因为我们需要各个卷积层的输出来测量内容和样式损失。某些层在训练期间的行为与评估不同,因此我们必须使用将网络设置为评估模式.eval()
开始训练
- 作者Leon Gatys在此处建议的那样,我们将使用L-BFGS算法来运行我们的梯度下降。与训练网络不同,我们希望训练输入图像以最大程度地减少内容/样式损失
代码
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 | from __future__ import print_function import torch import torch.nn as nn import torch.nn.functional as F import torch.optim as optim from PIL import Image import matplotlib.pyplot as plt import torchvision.transforms as transforms import torchvision.models as models import copy def image_loader(image_name): image = Image.open(image_name) # fake batch dimension required to fit network's input dimensions image = loader(image).unsqueeze(0) return image.to(device, torch.float) unloader = transforms.ToPILImage() # reconvert into PIL image def imshow(tensor, title=None): image = tensor.cpu().clone() # we clone the tensor to not do changes on it image = image.squeeze(0) # remove the fake batch dimension image = unloader(image) plt.imshow(image) if title is not None: plt.title(title) plt.pause(0.001) # pause a bit so that plots are updated class ContentLoss(nn.Module): # 尽管此模块名为ContentLoss,但它不是真正的PyTorch损失函数。 # 如果要将内容损失定义为PyTorch损失函数, # 则必须创建一个PyTorch autograd函数以在backward方法中手动重新计算/实现梯度。 def __init__(self, target,): super(ContentLoss, self).__init__() # we 'detach' the target content from the tree used # to dynamically compute the gradient: this is a stated value, # not a variable. Otherwise the forward method of the criterion # will throw an error. self.target = target.detach() def forward(self, input): self.loss = F.mse_loss(input, self.target) return input def gram_matrix(input): a, b, c, d = input.size() # a=batch size(=1) # b=number of feature maps # (c,d)=dimensions of a f. map (N=c*d) features = input.view(a * b, c * d) # resise F_XL into \hat F_XL G = torch.mm(features, features.t()) # compute the gram product # we 'normalize' the values of the gram matrix # by dividing by the number of element in each feature maps. return G.div(a * b * c * d) class StyleLoss(nn.Module): def __init__(self, target_feature): super(StyleLoss, self).__init__() self.target = gram_matrix(target_feature).detach() def forward(self, input): G = gram_matrix(input) self.loss = F.mse_loss(G, self.target) return input # create a module to normalize input image so we can easily put it in a # nn.Sequential class Normalization(nn.Module): def __init__(self, mean, std): super(Normalization, self).__init__() # .view the mean and std to make them [C x 1 x 1] so that they can # directly work with image Tensor of shape [B x C x H x W]. # B is batch size. C is number of channels. H is height and W is width. self.mean = torch.tensor(mean).view(-1, 1, 1) self.std = torch.tensor(std).view(-1, 1, 1) def forward(self, img): # normalize img return (img - self.mean) / self.std # desired depth layers to compute style/content losses : content_layers_default = ['conv_4'] style_layers_default = ['conv_1', 'conv_2', 'conv_3', 'conv_4', 'conv_5'] def get_style_model_and_losses(cnn, normalization_mean, normalization_std, style_img, content_img, content_layers=content_layers_default, style_layers=style_layers_default): cnn = copy.deepcopy(cnn) # normalization module normalization = Normalization(normalization_mean, normalization_std).to(device) # just in order to have an iterable access to or list of content/syle # losses content_losses = [] style_losses = [] # assuming that cnn is a nn.Sequential, so we make a new nn.Sequential # to put in modules that are supposed to be activated sequentially model = nn.Sequential(normalization) i = 0 # increment every time we see a conv for layer in cnn.children(): if isinstance(layer, nn.Conv2d): i += 1 name = 'conv_{}'.format(i) elif isinstance(layer, nn.ReLU): name = 'relu_{}'.format(i) # The in-place version doesn't play very nicely with the ContentLoss # and StyleLoss we insert below. So we replace with out-of-place # ones here. layer = nn.ReLU(inplace=False) elif isinstance(layer, nn.MaxPool2d): name = 'pool_{}'.format(i) elif isinstance(layer, nn.BatchNorm2d): name = 'bn_{}'.format(i) else: raise RuntimeError('Unrecognized layer: {}'.format(layer.__class__.__name__)) model.add_module(name, layer) if name in content_layers: # add content loss: target = model(content_img).detach() content_loss = ContentLoss(target) model.add_module("content_loss_{}".format(i), content_loss) content_losses.append(content_loss) if name in style_layers: # add style loss: target_feature = model(style_img).detach() style_loss = StyleLoss(target_feature) model.add_module("style_loss_{}".format(i), style_loss) style_losses.append(style_loss) # now we trim off the layers after the last content and style losses for i in range(len(model) - 1, -1, -1): if isinstance(model[i], ContentLoss) or isinstance(model[i], StyleLoss): break model = model[:(i + 1)] return model, style_losses, content_losses def get_input_optimizer(input_img): # this line to show that input is a parameter that requires a gradient optimizer = optim.LBFGS([input_img.requires_grad_()]) return optimizer def run_style_transfer(cnn, normalization_mean, normalization_std, content_img, style_img, input_img, num_steps=300, style_weight=1000000, content_weight=1): """Run the style transfer.""" print('Building the style transfer model..') model, style_losses, content_losses = get_style_model_and_losses(cnn, normalization_mean, normalization_std, style_img, content_img) optimizer = get_input_optimizer(input_img) print('Optimizing..') run = [0] while run[0] <= num_steps: def closure(): # correct the values of updated input image input_img.data.clamp_(0, 1) optimizer.zero_grad() model(input_img) style_score = 0 content_score = 0 for sl in style_losses: style_score += sl.loss for cl in content_losses: content_score += cl.loss style_score *= style_weight content_score *= content_weight loss = style_score + content_score loss.backward() # loss得bp!!! run[0] += 1 if run[0] % 50 == 0: print("run {}:".format(run)) print('Style Loss : {:4f} Content Loss: {:4f}'.format( style_score.item(), content_score.item())) print() return style_score + content_score optimizer.step(closure) # a last correction... input_img.data.clamp_(0, 1) return input_img if __name__ == '__main__': #1 Prepare device = torch.device('cuda:1' if torch.cuda.is_available() else 'cpu') # desired size of the output image imsize = 512 if torch.cuda.is_available() else 128 # use small size if no gpu loader = transforms.Compose([ transforms.Resize(imsize), # scale imported image transforms.ToTensor()]) # transform it into a torch tensor style_img = image_loader("./Dataset/neural-style/picasso.jpg") content_img = image_loader("./Dataset/neural-style/dancing.jpg") assert style_img.size() == content_img.size(), \ "we need to import style and content images of the same size" #2 Model cnn = models.vgg19(pretrained=True).features.to(device).eval() cnn_normalization_mean = torch.tensor([0.485, 0.456, 0.406]).to(device) cnn_normalization_std = torch.tensor([0.229, 0.224, 0.225]).to(device) #3 Input: 使用内容图像或白噪声的副本 input_img = content_img.clone() # input_img = torch.randn(content_img.data.size(), device=device) #4 Train output = run_style_transfer(cnn, cnn_normalization_mean, cnn_normalization_std, content_img, style_img, input_img) plt.figure() imshow(output, title='Output Image') plt.ioff() plt.show() plt.savefig('transferImg.png') |

对抗性实例生成
- 设计和训练模型的一个经常被忽略的方面是安全性和鲁棒性,尤其是在面对想要欺骗模型的对手的情况下。
- 提高您对ML模型的安全漏洞的认识,并深入了解对抗性机器学习的热门话题。
- 将通过图像分类器上的示例来探讨该主题。具体来说,我们将使用第一种也是最流行的攻击方法之一,即快速梯度符号攻击(FGSM) 来欺骗MNIST分类器
威胁模型
- 就上下文而言,有许多类别的对抗性攻击,每种攻击者都有不同的目标和对攻击者知识的假设。
- 总体目标是向输入数据添加最少的扰动,以引起所需的错误分类。
- 攻击假设两种是:white-box和black-box
- white-box攻击假设攻击者有充分的知识和访问模型,包括结构,输入,输出,和权重!
- black-box攻击假设攻击者只能访问输入和模型的输出,并且一无所知底层架构或权重
- 攻击目标:错误分类和源/目标错误分类 ① 错误分类的目标:对手只希望输出分类错误,而不关心新分类是什么。② 源/目标误分类:对手想要改变图像是特定源类的最初使得其被归类为特定的目标类
- FGSM攻击是white-box攻击,目的是进行错误分类
快速梯度符号攻击 FGSM
- 它旨在利用神经网络的学习方式,梯度来攻击神经网络
- 不是根据反向传播的梯度通过调整权重来使损失最小化,而是根据相同的反向传播的梯度来调整输入数据以使损失最大化。
- 即:攻击使用输入数据的损失梯度,然后调整输入数据以使损失最大化

代码 TODO
GAN
生成对抗网络
- 它们由两个不同的模型组成:生成器和判别器
- 生成器的工作是生成看起来像训练图像的“假”图像
- 判别器的工作是查看图像并从生成器输出它是真实的训练图像还是伪图像
- 在训练过程中,生成器不断尝试通过生成越来越好的伪造品而使判别器的性能超过智者,而判别器正在努力成为更好的侦探并正确地对真实和伪造图像进行分类
- 博弈的平衡点是当生成器生成的伪造品看起来像直接来自训练数据时,而判别器则始终猜测生成器输出是真实还是伪造品的50%置信度
- 然而,GAN的收敛理论仍在积极研究中,实际上,模型并不总是能达到此目的。
DCGAN
- Unsupervised Representation Learning With Deep Convolutional Generative Adversarial Networks
- 判别器由 Conv 、BN 以及 LeakyReLU 激活层组成。输入是
3x64x64 的图像,输出是输入图像来自实际数据的概率 - 生成器由 Deconv,BN 以及 ReLU 激活层组成。 输入是一个本征向量(latent vector) ,它是从标准正态分布中采样得到的,输出是一个3x64x64 的RGB图像!
MORE TODO
并行/分布式训练
- Parallel & Distribution
Not Only Pytorch
模型的生产部署 >> C++
扩展Pytorch
- TorchScript
使用PyTorch C ++前端
注解





