关于python:unicode()。decode(‘utf-8’,’ignore’)引发UnicodeEncodeError

unicode().decode('utf-8', 'ignore') raising UnicodeEncodeError

这是代码:

1
2
3
4
5
6
>>> z = u'\\u2022'.decode('utf-8', 'ignore')
Traceback (most recent call last):
  File"<stdin>", line 1, in <module>
  File"/usr/lib/python2.6/encodings/utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeEncodeError: 'latin-1' codec can't encode character u'\\u2022' in position 0: ordinal not in range(256)

为什么在使用.decode时会引发UnicodeEncodeError?

为什么在使用"忽略"时会出现任何错误?


当我刚开始弄乱python字符串和unicode时,花了我一段时间来理解解码和编码的术语,所以这是我的帖子,可能会有所帮助:

将解码视为从常规字节串转换为unicode所要做的事情,而将编码视为从unicode找回来的工作。 换一种说法:

您对str进行编码以生成unicode字符串

和en-编码unicode字符串以产生str

所以:

1
2
3
unicode_char = u'\\xb0'

encodedchar = unicode_char.encode('utf-8')

encodedchar将包含您的Unicode字符,以选定的编码显示(在本例中为utf-8)。


来自http://wiki.python.org/moin/UnicodeEncodeError

Paradoxically, a UnicodeEncodeError may happen when
decoding. The cause of it seems to be the
coding-specific decode() functions that normally expect
a parameter of type str. It appears that on seeing a
unicode parameter, the decode() functions"down-convert"
it into str, then decode the result assuming it to be of
their own coding. It also appears that the
"down-conversion" is performed using the ASCII encoder.
Hence an encoding failure inside a decoder.


您正在尝试解码unicode。 使解码工作的隐式编码是失败的。