python编码：打开/读取图像文件，解码图像，重新编码图像

Python Encoding: Open/Read Image File, Decode Image, RE-Encode Image

注意：我对编码/解码不太了解，但是在我遇到这个问题之后，这些词对我来说已经完全是行话了。

问题：我有点困惑。我在玩编码/解码图像，在django模型中存储一个作为TextField的图像，查看堆栈溢出，我发现我可以从ascii解码一个图像(我想还是二进制的？无论open('file', 'wb')使用什么作为编码。我假设默认的ascii到latin1，并将其存储在一个没有问题的数据库中。

问题来自于从latin1解码数据创建图像。当试图写入文件句柄时，我得到一个UnicodeEncodeError表示ascii编码失败。

我认为问题是当以二进制数据(rb形式打开文件时，它不是正确的ascii编码，因为它包含二进制数据。然后我将二进制数据解码为latin1，但当转换回ascii时(尝试写入文件时自动编码)，由于一些未知的原因，它失败了。

我的猜测是，当解码到latin1时，原始二进制数据被转换成其他数据，然后当尝试重新编码到ascii时，它无法识别曾经的原始二进制数据。(尽管原始数据和解码数据的长度相同)。或者问题不在于对latin1的解码，而在于我试图用ASCII编码二进制数据。在这种情况下，我如何编码latin1。数据返回到图像。

我知道这很令人困惑，但我对这一切都很困惑，所以我无法很好地解释。如果有人能回答这个问题，很可能有个谜语大师。

一些要可视化的代码：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47

>>> image_handle = open('test_image.jpg', 'rb')
>>>
>>> raw_image_data = image_handle.read()
>>> latin_image_data = raw_image_data.decode('latin1')
>>>
>>>
>>> # The raw data can't be processed by django
... # but in `latin1` it works
>>>
>>> # Analysis of the data
>>>
>>> type(raw_image_data), len(raw_image_data)
(<type 'str'>, 2383864)
>>>
>>> type(latin_image_data), len(latin_image_data)
(<type 'unicode'>, 2383864)
>>>
>>> len(raw_image_data) == len(latin_image_data)
True
>>>
>>>
>>> # How to write back to as a file?
>>>
>>> copy_image_handle = open('new_test_image.jpg', 'wb')
>>>
>>> copy_image_handle.write(raw_image_data)
>>> copy_image_handle.close()
>>>
>>>
>>> copy_image_handle = open('new_test_image.jpg', 'wb')
>>>
>>> copy_image_handle.write(latin_image_data)
Traceback (most recent call last):
File"<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-3: ordinal not in range(128)
>>>
>>>
>>> latin_image_data.encode('ascii')
Traceback (most recent call last):
File"<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-3: ordinal not in range(128)
>>>
>>>
>>> latin_image_data.decode('ascii')
Traceback (most recent call last):
File"<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-3: ordinal not in range(128)

相关讨论

与普通/疼痛文本文件不同，图像文件没有任何编码，显示的数据是图像的二进制等价物的可视表示。就像@cameron-f在上面的问题评论中所说的，这基本上是胡言乱语，任何编码都会破坏图像文件，所以不要尝试它。

但这并不意味着所有的希望都丧失了。这是我通常将图像转换为字符串并返回图像的方法。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43

from base64 import b64decode, b64encode

image_handle = open('test_image.jpg', 'rb')

raw_image_data = image_handle.read()

encoded_data = b64encode(raw_image_data)
compressed_data = zlib.compress(encoded_image, 9)

uncompressed_data = zlib.decompress(compressed_data)
decoded_data = b64decode(uncompressed_data)

new_image_handle = open('new_test_image.jpg', 'wb')

new_image_handle.write(decoded_data)
new_image_handle.close()
image_handle.close()

# Data Types && Data Size Analysis
type(raw_image_data), len(raw_image_data)
>>> (<type 'str'>, 2383864)

type(encoded_image), len(encoded_image)
>>> (<type 'str'>, 3178488)

type(compressed_data), len(compressed_data)
>>> (<type 'str'>, 2189311)

type(uncompressed_data), len(uncompressed_data)
>>> (<type 'str'>, 3178488)

type(decode_data), len(decode_data)
>>> (<type 'str'>, 2383864)

# Showing that the conversions were successful
decode_data == raw_image_data
>>> True

encoded_data == uncompressed_data
>>> True

因为jpeg是二进制文件，而ascii编码是纯文本文件中的纯文本，所以会弹出unicodeencode错误。

纯文本文件可以使用通用文本编辑器(如Windows记事本或Linux nano)创建。大多数将使用ASCII或Unicode编码。当文本编辑器读取一个ASCII文件时，它将抓取一个字节，比如说01100001(在DEC中为97)，并找到相应的标志符号"A"。

因此，当文本编辑器试图读取JPG时，它将获取相同的字节01100001并获得"A"，但由于文件包含用于显示照片的信息，因此文本将是不安全的。尝试在记事本或nano中打开jpeg。

关于编码，这里有一个解释：编码/解码之间的区别是什么？