在Python3中将字符串转换为字节的最佳方法?

Best way to convert string to bytes in Python 3?

似乎有两种不同的方法可以将字符串转换为字节,如对typeerror的回答所示:"str"不支持缓冲区接口。

这些方法中哪一种比较好或更适合用Python?还是只是个人喜好的问题?

1
2
3
b = bytes(mystring, 'utf-8')

b = mystring.encode('utf-8')


如果您查看bytes的文档,它会将您指向bytearray

bytearray(]])

Return a new array of bytes. The bytearray type is a mutable sequence of integers in the range 0 <= x < 256. It has most of the usual methods of mutable sequences, described in Mutable Sequence Types, as well as most methods that the bytes type has, see Bytes and Byte Array Methods.

The optional source parameter can be used to initialize the array in a few different ways:

If it is a string, you must also give the encoding (and optionally, errors) parameters; bytearray() then converts the string to bytes using str.encode().

If it is an integer, the array will have that size and will be initialized with null bytes.

If it is an object conforming to the buffer interface, a read-only buffer of the object will be used to initialize the bytes array.

If it is an iterable, it must be an iterable of integers in the range 0 <= x < 256, which are used as the initial contents of the array.

Without an argument, an array of size 0 is created.

因此,bytes不仅仅可以对字符串进行编码。它允许您使用任何类型的有意义的源参数调用构造函数,这是pythonic。

对于字符串的编码,我认为some_string.encode(encoding)比使用构造函数更具有派头性,因为它是最自我记录的——"接受这个字符串并用这个编码编码编码"比bytes(some_string, encoding)更清晰——使用构造函数时没有显式动词。

编辑:我检查了python源代码。如果使用cpython将一个unicode字符串传递给bytes,它将调用pyunicode〔u asencodedstring,这是encode的实现;因此,如果您自己调用encode的话,您将跳过一个间接级别。

另外,参见Serdalis的评论——unicode_string.encode(encoding)也更像是Python,因为它的倒数是byte_string.decode(encoding),对称性也很好。


它比人们想象的要容易:

1
2
3
4
5
my_str ="hello world"
my_str_as_bytes = str.encode(my_str)
type(my_str_as_bytes) # ensure it is byte representation
my_decoded_str = my_str_as_bytes.decode()
type(my_decoded_str) # ensure it is string representation


绝对最好的方法不是2,而是3。自Python3.0以来,encode的第一个参数默认为'utf-8'。所以最好的办法是

1
b = mystring.encode()

这也会更快,因为默认参数不会在C代码中产生字符串"utf-8",而会在C代码中产生字符串NULL,检查速度要快得多!

以下是一些时间安排:

1
2
3
4
5
6
7
8
9
In [1]: %timeit -r 10 'abc'.encode('utf-8')
The slowest run took 38.07 times longer than the fastest.
This could mean that an intermediate result is being cached.
10000000 loops, best of 10: 183 ns per loop

In [2]: %timeit -r 10 'abc'.encode()
The slowest run took 27.34 times longer than the fastest.
This could mean that an intermediate result is being cached.
10000000 loops, best of 10: 137 ns per loop

尽管有警告,但重复运行后的时间非常稳定——偏差仅为约2%。

在没有参数的情况下使用encode()与python2不兼容,因为在python2中,默认的字符编码是ascii。

1
2
3
4
>>> '???'.encode()
Traceback (most recent call last):
  File"<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0: ordinal not in range(128)


您可以使用以下方法简单地将字符串转换为字节:

a_string.encode()

您可以使用以下方法简单地将字节转换为字符串:

some_bytes.decode()

bytes.decodestr.encodeencoding='utf-8'作为默认值。

以下函数(取自有效的python)可能有助于将str转换为bytesbytes转换为str

1
2
3
4
5
6
7
8
9
10
11
12
13
14
def to_bytes(bytes_or_str):
    if isinstance(bytes_or_str, str):
        value = bytes_or_str.encode() # uses 'utf-8' for encoding
    else:
        value = bytes_or_str
    return value # Instance of bytes


def to_str(bytes_or_str):
    if isinstance(bytes_or_str, bytes):
        value = bytes_or_str.decode() # uses 'utf-8' for encoding
    else:
        value = bytes_or_str
    return value # Instance of str

1
2
so_string = 'stackoverflow'
so_bytes = so_string.encode( )