[Python]尝试使用OpenCV和pyocr识别图像中的字符

介绍

当我在寻找使用Selenium的文章时，发现了一篇有关寿司自动化的文章。
该方法基本如下
？开始游戏后继续输入所有键
？开始游戏时，请截图并输入由OCR获得的字符串
*由于游戏屏幕是在Canvas元素上绘制的，因此可以直接获得寿司

这次我尝试使用OpenCV作为OCR部件并进行预处理的简单图像处理

提前准备

安装tesseract

tesseract是OCR引擎。
这次，我将使用python的pyocr模块运行此OCR引擎
使用以下命令

完成安装

1	$ brew install tesseract

由于没有日语的测试数据，请从以下URL下载该数据
https://github.com/tesseract-ocr/tessdata
↑从此URL将jpn.traineddata下载到/ usr / local / share / tessdata /

安装pyocr和OpenCV

在终端中执行以下命令以完成

1 2	$ pip3 install pyocr $ pip3 install opencv-python

我暂时会尝试OCR

图像准备

测试图片在

↓修剪

将修整后的版本另存为test.png

带Pyocr的OCR

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16

import cv2
import pyocr
from PIL import Image
image = "test.png"

img = cv2.imread(image)
tools = pyocr.get_available_tools()
if len(tools) == 0:
print("No OCR tool found")
sys.exit(1)
tool = tools[0]
res = tool.image_to_string(
Image.open("test.png")
,lang="eng")

print(res)

执行结果

根本无法正确识别...
毕竟，似乎有必要进行预处理

尝试触摸OpenCV

我想使用OpenCV进行预处理，但是我是OpenCV的新手，所以我将对其进行处理
尝试处理您自己的图标图像

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29

import sys
import cv2
import pyocr
import numpy as np
from PIL import Image
image = "test_1.png"
name = "test_1"

#original
img = cv2.imread(image)

#gray
img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
cv2.imwrite(f"1_{name}_gray.png",img)

#goussian
img = cv2.GaussianBlur(img, (5, 5), 0)
cv2.imwrite(f"2_{name}_gaussian.png",img)

#threshold
img = cv2.adaptiveThreshold(
img
, 255
, cv2.ADAPTIVE_THRESH_GAUSSIAN_C
, cv2.THRESH_BINARY
, 11
, 2
)
cv2.imwrite(f"3_{name}_threshold.png",img)

处理过程中的图像如下所示
画像処理.png

OpenCV OCR

使用OpenCV预处理OCR中使用的图像，然后再次尝试OCR
在下文中，将灰度→阈值处理→颜色反转作为预处理

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42

import sys
import cv2
import pyocr
import numpy as np
from PIL import Image
image = "test.png"
name = "test"

#original
img = cv2.imread(image)
cv2.imwrite(f"1_{name}_original.png",img)

#gray
img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
cv2.imwrite(f"2_{name}_gray.png",img)

#threshold
th = 140
img = cv2.threshold(
img
, th
, 255
, cv2.THRESH_BINARY
)[1]
cv2.imwrite(f"3_{name}_threshold_{th}.png",img)

#bitwise
img = cv2.bitwise_not(img)
cv2.imwrite(f"4_{name}_bitwise.png",img)

cv2.imwrite("target.png",img)

tools = pyocr.get_available_tools()
if len(tools) == 0:
print("No OCR tool found")
sys.exit(1)
tool = tools[0]
res = tool.image_to_string(
Image.open("target.png")
,lang="eng")

print(res)

事前処理.png

执行结果

看来您能很好地识别它！
这次在这里