PyTorch `softmax` – `torch.nn` – `torch.nn.functional`

PyTorch softmax - torch.nn - torch.nn.functional

PyTorch documentation
https://pytorch.org/docs/stable/index.html

0. softmax - torch.nn - torch.nn.functional

torch.nn.functional.softmax (Python function, in torch.nn.functional)
定义的是一个函数。torch.nn.functional 中定义的是函数，由 def function( ) 定义，是一个固定的运算公式。
torch.nn.Softmax (Python class, in torch.nn)
定义的是一个类。torch.nn 中定义的是类，以 class xx 来定义的，可以提取变化的学习参数。

torch.nn.Softmax 是 Module 类，在实例化类后会初始化运算所需要的参数。这些参数会在 forward 和 backward之后根据 loss 进行更新，通常存放在定义模型的 __init__() 中。在 Module 类里的 __call__ 实现了 forward() 函数的调用。

train 和 test 阶段运行方法一致时，尽量用 torch.nn，避免了手动控制的麻烦。

深度学习中权重需要不断更新，需要采用类的方式，以确保能在参数发生变化时仍能使用之前定好的运算步骤。如果模型有可学习的参数，应该使用 torch.nn。但是简单的计算不需要建类来做，所以使用 torch.nn.functional 定义函数即可。

torch.nn.name	torch.nn.functional.name
类	函数
结构中包含所需要初始化的参数	在函数外定义并初始化相应参数，并作为参数传入
在 __init__() 中实例化，并在 forward 中运算。	在 __init__() 中初始化相应参数，在 forward 中传入参数。

1. softmax - torch.nn.functional.softmax (Python function, in torch.nn.functional)

https://pytorch.org/docs/stable/nn.functional.html

torch.nn.functional.softmax(input, dim=None, _stacklevel=3, dtype=None)

Applies a softmax function.
应用 softmax 函数。

Softmax is defined as:

$Softmax (x_{i}) = \frac{e x p (x_{i})}{\sum_{j} e x p (x_{j})} \text{Softmax}(x_{i}) = \frac{exp(x_i)}{\sum_j exp(x_j)}$

Softmax(xi?)=∑j?exp(xj?)exp(xi?)?

It is applied to all slices along dim, and will re-scale them so that the elements lie in the range [0, 1] and sum to 1.
它应用于 dim 上的所有切片, 并将对它们进行重新缩放, 使元素位于 [0, 1] 范围内，和为 1。

See Softmax for more details.
https://pytorch.org/docs/stable/nn.html#torch.nn.Softmax

1.1 Parameters

input (Tensor) – input
dim (int) – A dimension along which softmax will be computed.
dtype (torch.dtype, optional) – the desired data type of returned tensor. If specified, the input tensor is casted to dtype before the operation is performed. This is useful for preventing data type overflows. Default: None.
返回 tenosr 的期望数据类型。如果指定了参数，输入张量在执行操作之前被转换为 dtype。这对于防止数据类型溢出非常有用。默认值：None。

This function doesn’t work directly with NLLLoss, which expects the Log to be computed between the Softmax and itself. Use log_softmax instead (it’s faster and has better numerical properties).
此函数不能直接与 NLLLoss 一起使用，NLLLoss 希望在 Softmax 及其自身之间计算 Log。使用 log_softmax 来代替 (它更快，并且具有更好的数值属性)。

2. softmax - torch.nn.Softmax (Python class, in torch.nn)

https://pytorch.org/docs/stable/nn.html

CLASS torch.nn.Softmax(dim=None)

Applies the Softmax function to an n-dimensional input Tensor rescaling them so that the elements of the n-dimensional output Tensor lie in the range [0, 1] and sum to 1.
将 Softmax 函数应用于 n 维输入 Tensor，对其进行重新缩放，以使 n 维输出 Tensor 的元素位于 [0, 1] 范围内，且总和为 1。

Softmax is defined as:

$Softmax (x_{i}) = \frac{\exp ? (x_{i})}{\sum_{j} \exp ? (x_{j})} \text{Softmax}(x_{i}) = \frac{\exp(x_i)}{\sum_j \exp(x_j)}$

Softmax(xi?)=∑j?exp(xj?)exp(xi?)?

2.1 Shape

Input: (*) where * means, any number of additional dimensions
Output: (*), same shape as the input

2.2 Returns

a Tensor of the same dimension and shape as the input with values in the range [0, 1].

2.3 Parameters

dim (int) – A dimension along which Softmax will be computed (so every slice along dim will sum to 1).
用来计算 Softmax 的尺寸 (因此，沿 dim 的每个切片的总和为 1)。

This module doesn’t work directly with NLLLoss, which expects the Log to be computed between the Softmax and itself. Use LogSoftmax instead (it’s faster and has better numerical properties).

3. Examples

在这里插入图片描述

3.1 example

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20

3.2 example

执行算子后，在 dim=0 维上的分布，output_0[0][0][0]+ output_0[1][0][0] = 1。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49

(pt-1.4_py-3.6) yongqiang@yongqiang:~$ python
Python 3.6.10 |Anaconda, Inc.| (default, May 8 2020, 02:54:21)
[GCC 7.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> import torch.nn.functional as F
>>>
>>> input = torch.randn(2, 3, 4)
>>> input
tensor([[[-0.3976, 0.4142, -0.5061, -0.4063],
[-0.2401, -1.5699, 1.4867, -1.9940],
[ 0.5525, -1.4140, -1.2408, -0.6638]],

[[-1.7829, 0.1077, 0.8127, -2.8241],
[ 1.4750, 0.5804, 1.1887, -0.6570],
[ 0.2279, 0.9583, -1.9489, -0.5876]]])
>>>
>>> output_0 = F.softmax(input, dim=0)
>>> output_0
tensor([[[0.7999, 0.5760, 0.2110, 0.9182],
[0.1525, 0.1043, 0.5740, 0.2080],
[0.5805, 0.0853, 0.6700, 0.4810]],

[[0.2001, 0.4240, 0.7890, 0.0818],
[0.8475, 0.8957, 0.4260, 0.7920],
[0.4195, 0.9147, 0.3300, 0.5190]]])
>>>
>>> output_1 = F.softmax(input, dim=1)
>>> output_1
tensor([[[0.2102, 0.7703, 0.1134, 0.5057],
[0.2461, 0.1059, 0.8322, 0.1034],
[0.5437, 0.1238, 0.0544, 0.3909]],

[[0.0290, 0.2022, 0.3969, 0.0524],
[0.7543, 0.3244, 0.5780, 0.4574],
[0.2167, 0.4734, 0.0251, 0.4903]]])
>>>
>>> output_2 = F.softmax(input, dim=2)
>>> output_2
tensor([[[0.1945, 0.4381, 0.1745, 0.1928],
[0.1416, 0.0375, 0.7964, 0.0245],
[0.6239, 0.0873, 0.1038, 0.1849]],

[[0.0468, 0.3098, 0.6269, 0.0165],
[0.4389, 0.1794, 0.3296, 0.0521],
[0.2753, 0.5716, 0.0312, 0.1218]]])
>>>
>>> exit()
(pt-1.4_py-3.6) yongqiang@yongqiang:~$

3.3 example

dim=0 是张量的 0 轴，dim=1 是张量的 1 轴。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24

3.4 example

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39