关于python：Keras Conv2D和输入通道

Keras Conv2D and input channels

Keras层文档指定了卷积层的输入和输出大小：
https://keras.io/layers/convolutional/

输入形状：(samples, channels, rows, cols)

输出形状：(samples, filters, new_rows, new_cols)

内核大小是一个空间参数，即仅确定宽度和高度。

因此，具有c通道的输入将产生具有filters通道的输出，而不管c的值如何。因此，它必须与空间height x width滤波器一起应用2D卷积，然后以某种方式汇总每个学习到的滤波器的结果。

这是什么聚合运算符？是跨渠道的总和吗？我可以控制吗？我在Keras文档中找不到任何信息。

请注意，在TensorFlow中，过滤器也在深度通道中指定：
https://www.tensorflow.org/api_guides/python/nn#Convolution，
因此深度操作很清楚。

谢谢。

相关讨论

我也对此感到疑惑，并在此处找到另一个答案(重点是我的意思)：

Maybe the most tangible example of a multi-channel input is when you have a color image which has 3 RGB channels. Let's get it to a convolution layer with 3 input channels and 1 output channel. (...) What it does is that it calculates the convolution of each filter with its corresponding input channel (...). The stride of all channels are the same, so they output matrices with the same size. Now, it sums up all matrices and output a single matrix which is the only channel at the output of the convolution layer.

插图：

enter image description here

注意，每个通道的卷积核的权重是不同的，然后在反向传播步骤中通过例如基于梯度体面的算法，例如随机梯度下降(SDG)。

这是TensorFlow API的更多技术答案。

我还需要说服自己，所以我用一个3×3 RGB图像运行了一个简单的示例。

1
2
3
4

# red # green # blue
1 1 1 100 100 100 10000 10000 10000
1 1 1 100 100 100 10000 10000 10000
1 1 1 100 100 100 10000 10000 10000

筛选器初始化为：

1
2

1 1
1 1

enter image description here

我还设置了卷积以具有以下属性：

没有填充
步幅= 1
relu激活功能
偏差初始化为0

我们期望(汇总)输出为：

1 2	40404 40404 40404 40404

另外，从上图可以看出，没有。的参数是

3个独立的滤波器(每个通道一个)×4权重+1(偏置，未显示)= 13个参数

这是代码。

导入模块：

1
2
3

import numpy as np
from keras.layers import Input, Conv2D
from keras.models import Model

创建红色，绿色和蓝色通道：

1
2
3

red = np.array([1]*9).reshape((3,3))
green = np.array([100]*9).reshape((3,3))
blue = np.array([10000]*9).reshape((3,3))

堆叠通道以形成RGB图像：

1 2	img = np.stack([red, green, blue], axis=-1) img = np.expand_dims(img, axis=0)

创建一个仅进行Conv2D卷积的模型：

1
2
3
4
5
6
7
8
9

inputs = Input((3,3,3))
conv = Conv2D(filters=1,
strides=1,
padding='valid',
activation='relu',
kernel_size=2,
kernel_initializer='ones',
bias_initializer='zeros', )(inputs)
model = Model(inputs,conv)

在模型中输入图像：

1
2
3
4
5
6

model.predict(img)
# array([[[[40404.],
# [40404.]],

# [[40404.],
# [40404.]]]], dtype=float32)

运行摘要以获取参数数量：

1	model.summary()

enter image description here