Numpy PIL Python：在空白上裁剪图像或使用直方图阈值裁剪文本

Numpy PIL Python : crop image on whitespace or crop text with histogram Thresholds

我该如何找到下图中数字周围的空白区域的边界框或窗口？：

原始图像：

enter image description here 。

高度：762像素宽度：1014像素

目标：

比如：{x-bound:[x-upper,x-lower], y-bound:[y-upper,y-lower]}，这样我就可以裁剪文本并输入到tesseract或一些ocr中。

尝试：

我曾想过将图像分割成硬编码的块大小，然后随机分析，但我认为这样做太慢了。

使用pyplot的示例代码改编自(使用python和pil，如何在图像中获取文本块？)：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21

from PIL import Image
import numpy as np
import matplotlib.pyplot as plt
im = Image.open('/home/jmunsch/Pictures/Aet62.png')
p = np.array(im)
p = p[:,:,0:3]
p = 255 - p
lx,ly,lz = p.shape

plt.plot(p.sum(axis=1))
plt.plot(p.sum(axis=0))

#I was thinking something like this
#The image is a 3-dimensional ndarray [[x],[y],[color?]]
#Set each value below an axes mean to 0
[item = 0 for item in p[axis=0] if item < p.mean(axis=0)]

# and then some type of enumerated groupby for each axes
#finding the mean index for each groupby(0) on axes

plt.plot(p[mean_index1:mean_index2,mean_index3:mean_index4])

根据这些图，每个山谷都表示要绑定的位置。

第一个图表显示文本行的位置
第二张图显示了字符的位置

小精灵绘图示例plt.plot(p.sum(axis=1))：

氧化镁

plt.plot(p.sum(axis=0))

氧化镁

相关讨论

我认为你可以在scipy.ndimage中使用形态学函数，下面是一个例子：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15

import pylab as pl
import numpy as np
from scipy import ndimage
img = pl.imread("Aet62.png")[:, :, 0].astype(np.uint8)
img2 = ndimage.binary_erosion(img, iterations=40)
img3 = ndimage.binary_dilation(img2, iterations=40)
labels, n = ndimage.label(img3)
counts = np.bincount(labels.ravel())
counts[0] = 0
img4 = labels==np.argmax(counts)
img5 = ndimage.binary_fill_holes(img4)
result = ~img & img5
result = ndimage.binary_erosion(result, iterations=3)
result = ndimage.binary_dilation(result, iterations=3)
pl.imshow(result, cmap="gray")

输出为：

enter image description here 。