关于机器学习:yolo算法的坐标输出代表什么?

What does the coordinate output of yolo algorithm represent?

我的问题与此主题相似。当我开始考虑yolo算法的输出时,我正在观看Andrew Ng关于边界框预测的讲座。让我们考虑这个示例,我们使用19x19网格和只有一个2类的接收场,因此我们的输出将为=> 19x19x1x5。最后一个维度(大小为5的数组)表示以下内容:

1
2
3
4
5
1) The class (0 or 1)  
2) X-coordinate  
3) Y-coordinate  
4) height of the bounding box  
5) Width of the bounding box

我不明白X,Y坐标是代表整个图像的大小还是正好接受域(过滤器)的边界框。在视频中,边界框被表示为接受域的一部分,但逻辑上的接受域比边界框小得多,而且人们可能会修改滤镜的大小,因此将边界框相对于滤镜定位是没有意义的。

那么,基本上,图像的边界框的坐标代表什么?


来自了解YOLO帖子@ Hacker Noon:

Each grid cell predicts B bounding boxes as well as C class
probabilities. The bounding box prediction has 5 components: (x, y, w,
h, confidence). The (x, y) coordinates represent the center of the
box, relative to the grid cell location (remember that, if the center
of the box does not fall inside the grid cell, than this cell is not
responsible for it). These coordinates are normalized to fall between
0 and 1. The (w, h) box dimensions are also normalized to [0, 1],
relative to the image size. Let’s look at an example: