unicode().decode('utf-8', 'ignore') raising UnicodeEncodeError
这是代码:
1 2 3 4 5 6 | >>> z = u'\\u2022'.decode('utf-8', 'ignore') Traceback (most recent call last): File"<stdin>", line 1, in <module> File"/usr/lib/python2.6/encodings/utf_8.py", line 16, in decode return codecs.utf_8_decode(input, errors, True) UnicodeEncodeError: 'latin-1' codec can't encode character u'\\u2022' in position 0: ordinal not in range(256) |
为什么在使用.decode时会引发UnicodeEncodeError?
为什么在使用"忽略"时会出现任何错误?
当我刚开始弄乱python字符串和unicode时,花了我一段时间来理解解码和编码的术语,所以这是我的帖子,可能会有所帮助:
将解码视为从常规字节串转换为unicode所要做的事情,而将编码视为从unicode找回来的工作。 换一种说法:
您对
和en-编码
所以:
1 2 3 | unicode_char = u'\\xb0' encodedchar = unicode_char.encode('utf-8') |
来自http://wiki.python.org/moin/UnicodeEncodeError
Paradoxically, a UnicodeEncodeError may happen when
decoding. The cause of it seems to be the
coding-specific decode() functions that normally expect
a parameter of type str. It appears that on seeing a
unicode parameter, the decode() functions"down-convert"
it into str, then decode the result assuming it to be of
their own coding. It also appears that the
"down-conversion" is performed using the ASCII encoder.
Hence an encoding failure inside a decoder.
您正在尝试解码