关于性能：python字符串格式：% vs. .format

Python string formatting: % vs. .format

python 2.6引入了str.format()方法，其语法与现有的%运算符稍有不同。哪一个更好，在什么情况下？

下面使用每种方法，结果相同，那么有什么区别呢？

1
2
3
4
5
6
7
8
9
10
11
12
13
14

#!/usr/bin/python
sub1 ="python string!"
sub2 ="an arg"

a ="i am a %s" % sub1
b ="i am a {0}".format(sub1)

c ="with %(kwarg)s!" % {'kwarg':sub2}
d ="with {kwarg}!".format(kwarg=sub2)

print a #"i am a python string!"
print b #"i am a python string!"
print c #"with an arg!"
print d #"with an arg!"

此外，字符串格式化在Python中何时发生？例如，如果我的日志记录级别设置为高，那么执行以下%操作时是否仍会受到影响？如果是这样，有没有办法避免这种情况？

1	log.debug("some debug info: %s" % some_info)

相关讨论

回答你的第一个问题….format在许多方面似乎更为复杂。%的一个恼人之处在于它可以采用变量或元组。你会认为以下方法总是有效的：

1	"hi there %s" % name

然而，如果name恰好是(1, 2, 3)的话，它就会抛出TypeError。为了保证它能一直打印，你需要这样做

1	"hi there %s" % (name,) # supply the single argument as a single-item tuple

真是难看。.format没有这些问题。在您给出的第二个示例中，.format示例的外观也更清晰。

你为什么不使用它？

不知道(我读之前)
必须与python 2.5兼容

要回答第二个问题，字符串格式化与任何其他操作同时发生-在计算字符串格式化表达式时。python不是一种懒惰的语言，它在调用函数之前对表达式进行计算，因此在您的log.debug示例中，表达式"some debug info: %s"%some_info将首先计算为，例如"some debug info: roflcopters are active"，然后该字符串将传递给log.debug()。

相关讨论

那么"%(a)s, %(a)s" % {'a':'test'}呢？
注意，对于log.debug("something: %s" % x)，您将浪费时间，但对于log.debug("something: %s", x)，则不会浪费时间。字符串格式将在方法中处理，如果不记录，则不会影响性能。和往常一样，python会预测您的需求=)
泰德："a，a"。格式(a='测试')
通过%字符进行的字符串插入正在逐步从其外观中取消。目前的政治公众人物认为它会存在一段时间，但我将来会更多地依靠.format。
泰德：和'{0}, {0}'.format('test')一样，这是一个看起来更糟糕的黑客。
要点是：新语法允许对项目重新排序的一个重复出现的参数是一个无意义的点：您可以对旧语法执行相同的操作。大多数人不知道这实际上已经在ANSIC99标准中定义了！查看man sprintf的最新副本，了解%中的$符号。
@CFI：如果你的意思是，printf("%2$d", 1, 3)打印出"3"，这是在posix中指定的，而不是c99。您引用的手册页注释"C99标准不包括使用"$"..的样式"。
另外，"%d atoms" % 10*3给出了"10个原子10个原子"，而10 atoms10 atoms10 atoms给出了"9个原子"。
@flyingsheep如果你看过django源代码，这个"hack"经常被使用。
汤姆：如果你只想用同一个物体两次，而不是增加清晰度，那确实是一次黑客行为。如果你在某种程度上被%卡住了(例如，因为它是在你的API中烘焙的)，那么你当然必须使用它。
.格式可能更优雅，但速度可能比%慢5倍，这取决于您的目标。
实际上，我曾经遇到过这样的情况：旧样式的格式比使用索引标记的新样式的格式慢得多。这可能取决于你如何使用它。另外，str.format更面向对象，因为它只是str实例上的一个方法，而不是特殊的语法。这更符合Python的"显式优于隐式"哲学，因为它避开了额外的语法。
@使用%符号的davidsander与使用.format()符号的oop完全相同，百分比运算符只是字符串对象的.__mod__()方法。
我认为.format的唯一缺点是，键入"{0}".format(var)比"%s" % var需要更长的时间。

模运算符(%)无法执行的操作，afaik:

1 2	tu = (12,45,22222,103,6) print '{0} {2} {1} {2} {3} {2} {4} {2}'.format(*tu)

结果

1	12 22222 45 22222 103 22222 6 22222

非常有用。

另一点：format()是一个函数，可以在其他函数中用作参数：

1
2
3
4
5
6
7
8
9
10
11
12
13
14

li = [12,45,78,784,2,69,1254,4785,984]
print map('the number is {}'.format,li)

print

from datetime import datetime,timedelta

once_upon_a_time = datetime(2010, 7, 1, 12, 0, 0)
delta = timedelta(days=13, hours=8, minutes=20)

gen =(once_upon_a_time +x*delta for x in xrange(20))

print '
'.join(map('{:%Y-%m-%d %H:%M:%S}'.format, gen))

结果：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22

['the number is 12', 'the number is 45', 'the number is 78', 'the number is 784', 'the number is 2', 'the number is 69', 'the number is 1254', 'the number is 4785', 'the number is 984']

2010-07-01 12:00:00
2010-07-14 20:20:00
2010-07-28 04:40:00
2010-08-10 13:00:00
2010-08-23 21:20:00
2010-09-06 05:40:00
2010-09-19 14:00:00
2010-10-02 22:20:00
2010-10-16 06:40:00
2010-10-29 15:00:00
2010-11-11 23:20:00
2010-11-25 07:40:00
2010-12-08 16:00:00
2010-12-22 00:20:00
2011-01-04 08:40:00
2011-01-17 17:00:00
2011-01-31 01:20:00
2011-02-13 09:40:00
2011-02-26 18:00:00
2011-03-12 02:20:00

相关讨论

在map中，您可以像使用格式一样轻松地使用旧式格式。map('some_format_string_%s'.__mod__, some_iterable)
这是错误的。查看其他答案，并查看旧语法的原始定义。大多数人不知道这实际上已经在ANSIC99标准中定义了！查看man sprintf的最新副本，了解%占位符内的$符号。所以，即使Python不支持原始语法的这个特性，为什么要引入新的语法呢？为什么不扩展解释器来理解c99 std中的所有内容呢？
@CFI：请通过在C99中重写上面的示例来证明您是正确的。
@3月：用gcc -std=c99 test.c -o test编译printf("%2$s %1$s
","One","Two");，输出为Two One。但我纠正了：它实际上是POSIX扩展而不是C。我不能在C/C++标准中再次找到它，在那里我认为我已经看到它了。该代码甚至与"c90"std标志一起工作。sprintf手册页。这不会列出它，但允许libs实现一个超集。我原来的论点仍然有效，用Posix代替C。
我在这里的第一条评论不适用于这个答案。我后悔这句话。在python中，我们不能使用modulo操作符%来重新排序占位符。为了保持评论的一致性，我还是不想删除第一条评论。我很抱歉在这里发泄了我的愤怒。它针对的是经常出现的语句，即旧语法本身不允许这样做。我们可以引入std-posix扩展，而不是创建一个全新的语法。我们可以两者兼得。
"modulo"指在除法后计算余数的运算符。在这种情况下，百分号不是模运算符。
.format()不是一种方法吗？
@到目前为止，章鱼语义学是我心目中唯一一个反对老式运算符的有力论据。例如，模块和格式概念不应该有相同的语法。如果只是因为我不确定当我只使用字符串格式时，%是否是模数运算符。我希望。格式不是那么难看。
(@cfi off topic，但是%2$s的东西在c中已经相当坏了；例如，你不能跳过一个参数；而且格式语法比重新排序的东西更通用)
@八达通：从docs.python.org/release/2.5/lib/typesseq-strings.html"字符串和unicode对象有一个独特的内置操作：百分比运算符(modulo)。"所以，是的，它是modulo运算符，它有一个神奇的方法__mod__，即使它在这个用法中没有modulo的数学语义。
有趣的是，对于你答案的最上面的例子中到底发生了什么，没有任何解释。显示为按索引解包。

假设您使用的是python的logging模块，那么您可以将字符串格式参数作为参数传递给.debug()方法，而不是自己进行格式设置：

1	log.debug("some debug info: %s", some_info)

这样可以避免进行格式化，除非记录器实际记录了某些内容。

相关讨论

有什么方法可以利用dict格式获得这种性能优势？
这是我刚刚学到的一些有用的信息。遗憾的是，它没有自己的问题，因为它似乎与主问题是分离的。遗憾的是，手术没有把他的问题分成两个独立的问题。
您可以使用这样的dict格式：log.debug("some debug info: %(this)s and %(that)s", dict(this='Tom', that='Jerry'))，但是，您不能在这里使用新样式的.format()语法，甚至不能在python 3.3中使用，这是一种耻辱。
@雪铁龙：见：Plumberjack.blogspot.co.uk/2010/10/&hellip；
@很酷，谢谢你的信息，很抱歉传播了错误的信息。它也在"3.2中的新功能"一文中：docs.python.org/3.2/what s new/3.2.html_logging。非常有用。
@我只是把OP的问题分成两个独立的问题。但这个好的答案将两者统一起来。
这样做的主要好处不是性能(与日志记录的输出相比，执行字符串插值会很快，例如在终端中显示、保存到磁盘)而是如果您有日志聚合器，它可以告诉您"您有12个此错误消息的实例"，即使它们都有不同的"某些实例"价值观。如果在将字符串传递给log.debug之前完成了字符串格式设置，那么这是不可能的。聚合器只能说"您有12条不同的日志消息"
@Jonathanhartley：取决于你的聚合器有多复杂。
嘿！我不知道怎么办。这样做是非常简单的(即可以在聚合器中实现大约三个字符，字面上)，如果它不这样做，那么从什么意义上说它是一个有用的聚合器？
不，我的意思是，即使使用字符串格式，一个足够复杂的系统也可以找到相似之处并将它们组合在一起。我不知道是否有人能做到这一点。
哦！我懂了。我把你的意思颠倒了。请再说一遍。是的，我可以想象一个聚合器试图做到这一点，但它看起来很愚蠢。拥抱！
如果您关心性能，请使用文本dict语法而不是dict()类实例化：doughellmann.com/2012/11/&hellip；
您可以使用任何格式样式延迟格式设置。您不局限于Python模块格式。日志模块在发出日志消息之前将对象转换为字符串。因此，任何实现__str__的对象都可以。这可能包括一个类，它接受一个带有参数的新型格式字符串，但直到日志模块调用__str__后才执行该操作。请看下面我的答案。

从python 3.6(2016)起，可以使用f字符串替换变量：

1
2
3
4

>>> origin ="London"
>>> destination ="Paris"
>>> f"from {origin} to {destination}"
'from London to Paris'

注意f"前缀。如果您在python 3.5或更早版本中尝试此方法，您将得到一个SyntaxError。

参见https://docs.python.org/3.6/reference/lexical_analysis.html_f-strings

PEP 3101建议用python 3中新的高级字符串格式替换%操作符，在这里它将是默认的。

相关讨论

但是请小心，刚才我在用现有代码中的.format替换所有%时发现了一个问题：'{}'.format(unicode_string)将尝试对unicode字符串进行编码，可能会失败。

看看这个python交互会话日志：

1
2
3
4
5
6
7
8

Python 2.7.2 (default, Aug 27 2012, 19:52:55)
[GCC 4.1.2 20080704 (Red Hat 4.1.2-48)] on linux2
; s='й'
; u=u'й'
; s
'\xd0\xb9'
; u
u'\u0439'

s只是一个字符串(在python3中称为"字节数组")，u是一个Unicode字符串(在python3中称为"字符串")：

1
2
3
4

; '%s' % s
'\xd0\xb9'
; '%s' % u
u'\u0439'

将unicode对象作为参数提供给%运算符时，即使原始字符串不是unicode，它也会生成unicode字符串：

1
2
3
4
5
6

; '{}'.format(s)
'\xd0\xb9'
; '{}'.format(u)
Traceback (most recent call last):
File"<stdin>", line 1, in <module>
UnicodeEncodeError: 'latin-1' codec can't encode character u'\u0439' in position 0: ordinal not in range(256)

但是，.format函数会引起"unicodeencodeerror"：

1
2
3
4

; u'{}'.format(s)
u'\xd0\xb9'
; u'{}'.format(u)
u'\u0439'

只有原始字符串是unicode时，它才能使用unicode参数fine。

1 2	; '{}'.format(u'i') 'i'

或者参数字符串是否可以转换为字符串(所谓的"字节数组")。

相关讨论

.format的另一个优点(我在答案中没有看到)：它可以获得对象属性。

1
2
3
4
5
6
7
8
9
10

In [12]: class A(object):
....: def __init__(self, x, y):
....: self.x = x
....: self.y = y
....:

In [13]: a = A(2,3)

In [14]: 'x is {0.x}, y is {0.y}'.format(a)
Out[14]: 'x is 2, y is 3'

或者，作为关键字参数：

1 2	In [15]: 'x is {a.x}, y is {a.y}'.format(a=a) Out[15]: 'x is 2, y is 3'

据我所知，用%是不可能的。

相关讨论

正如我今天发现的，通过%格式化字符串的旧方法不支持Decimal，即python的十进制定点和浮点算术模块，即开箱即用。

示例(使用python 3.3.5)：

1
2
3
4
5
6
7
8
9

#!/usr/bin/env python3

from decimal import *

getcontext().prec = 50
d = Decimal('3.12375239e-24') # no magic number, I rather produced it by banging my head on my keyboard

print('%.50f' % d)
print('{0:.50f}'.format(d))

输出：

0.00000000000000000000000312375239000000009907464850
0.00000000000000000000000312375239000000000000000000

当然可能会有一些解决办法，但您仍然可以考虑立即使用format()方法。

相关讨论

从我的测试来看，%比format的性能更好。

测试代码：

Python 2.7.2:

1
2
3

import timeit
print 'format:', timeit.timeit("'{}{}{}'.format(1, 1.23, 'hello')")
print '%:', timeit.timeit("'%s%s%s' % (1, 1.23, 'hello')")

结果：

1 2	> format: 0.470329046249 > %: 0.357107877731

Python 3.5.2

1
2
3

import timeit
print('format:', timeit.timeit("'{}{}{}'.format(1, 1.23, 'hello')"))
print('%:', timeit.timeit("'%s%s%s' % (1, 1.23, 'hello')"))

结果

1 2	> format: 0.5864730989560485 > %: 0.013593495357781649

在python2中，差异很小，而在python3中，%比format快得多。

感谢@chris cogdon提供示例代码。

相关讨论

作为旁注，您不必为日志使用新样式格式而受到性能影响。您可以将任何对象传递给logging.debug、logging.info等实现__str__魔法方法的对象。当日志模块决定必须发出消息对象(无论它是什么)时，它会在发出消息对象之前调用str(message_object)。所以你可以这样做：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30

import logging

class NewStyleLogMessage(object):
def __init__(self, message, *args, **kwargs):
self.message = message
self.args = args
self.kwargs = kwargs

def __str__(self):
args = (i() if callable(i) else i for i in self.args)
kwargs = dict((k, v() if callable(v) else v) for k, v in self.kwargs.items())

return self.message.format(*args, **kwargs)

N = NewStyleLogMessage

# Neither one of these messages are formatted (or calculated) until they're
# needed

# Emits"Lazily formatted log entry: 123 foo" in log
logging.debug(N('Lazily formatted log entry: {0} {keyword}', 123, keyword='foo'))

def expensive_func():
# Do something that takes a long time...
return 'foo'

# Emits"Expensive log entry: foo" in log
logging.debug(N('Expensive log entry: {keyword}', keyword=expensive_func))

这都在python 3文档(https://docs.python.org/3/howto/logging cookbook.html格式化样式)中描述。但是，它也可以与Python2.6一起使用(https://docs.python.org/2.6/library/logging.html将任意对象用作消息)。

使用此技术的一个优点是，它允许使用惰性值，例如上面的函数expensive_func，而不是格式化风格不可知。这为python文档提供了一个更优雅的替代建议：https://docs.python.org/2.6/library/logging.html优化。

相关讨论

如果您的python>=3.6，那么f-string格式的literal就是您的新朋友。

它更简单、干净、性能更好。

1
2
3
4
5
6
7
8
9
10

In [1]: params=['Hello', 'adam', 42]

In [2]: %timeit"%s %s, the answer to everything is %d."%(params[0],params[1],params[2])
448 ns ± 1.48 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [3]: %timeit"{} {}, the answer to everything is {}.".format(*params)
449 ns ± 1.42 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [4]: %timeit f"{params[0]} {params[1]}, the answer to everything is {params[2]}."
12.7 ns ± 0.0129 ns per loop (mean ± std. dev. of 7 runs, 100000000 loops each)

当您格式化regex表达式时，%可能会有所帮助。例如，

1	'{type_names} [a-z]{2}'.format(type_names='triangle\|square')

升起IndexError。在这种情况下，您可以使用：

1	'%(type_names)s [a-z]{2}' % {'type_names': 'triangle\|square'}

这样可以避免将regex写为'{type_names} [a-z]{{2}}'。当您有两个正则表达式时，这可能很有用，其中一个单独使用而不使用格式，但这两个正则表达式的串联都是格式化的。

相关讨论

我想补充一下，从3.6版开始，我们可以使用如下的fstring

1
2
3

foo ="john"
bar ="smith"
print(f"My name is {foo} {bar}")

给出

My name is john smith

所有内容都转换为字符串

1 2	mylist = ["foo","bar"] print(f"mylist = {mylist}")

结果：

mylist = ['foo', 'bar']

您可以传递函数，就像在其他格式方法中一样

1	print(f'Hello, here is the date : {time.strftime("%d/%m/%Y")}')

举个例子

Hello, here is the date : 16/04/2018

相关讨论

对于python版本>=3.6(请参阅PEP 498)

1
2
3
4
5
6
7

s1='albha'
s2='beta'

f'{s1}{s2:>10}'

#output
'albha beta'

但有一件事是，如果您有嵌套的大括号，将不适用于格式，但%将工作。

例子：

1
2
3
4
5
6
7
8

>>> '{{0}, {1}}'.format(1,2)
Traceback (most recent call last):
File"<pyshell#3>", line 1, in <module>
'{{0}, {1}}'.format(1,2)
ValueError: Single '}' encountered in format string
>>> '{%s, %s}'%(1,2)
'{1, 2}'
>>>

相关讨论

python 3.6.7比较：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41

#!/usr/bin/env python
import timeit

def time_it(fn):
"""
Measure time of execution of a function
"""
def wrapper(*args, **kwargs):
t0 = timeit.default_timer()
fn(*args, **kwargs)
t1 = timeit.default_timer()
print("{0:.10f} seconds".format(t1 - t0))
return wrapper

@time_it
def new_new_format(s):
print("new_new_format:", f"{s[0]} {s[1]} {s[2]} {s[3]} {s[4]}")

@time_it
def new_format(s):
print("new_format:","{0} {1} {2} {3} {4}".format(*s))

@time_it
def old_format(s):
print("old_format:","%s %s %s %s %s" % s)

def main():
samples = (("uno","dos","tres","cuatro","cinco"), (1,2,3,4,5), (1.1, 2.1, 3.1, 4.1, 5.1), ("uno", 2, 3.14,"cuatro", 5.5),)
for s in samples:
new_new_format(s)
new_format(s)
old_format(s)
print("-----")

if __name__ == '__main__':
main()

输出：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28

new_new_format: uno dos tres cuatro cinco
0.0000170280 seconds
new_format: uno dos tres cuatro cinco
0.0000046750 seconds
old_format: uno dos tres cuatro cinco
0.0000034820 seconds
-----
new_new_format: 1 2 3 4 5
0.0000043980 seconds
new_format: 1 2 3 4 5
0.0000062590 seconds
old_format: 1 2 3 4 5
0.0000041730 seconds
-----
new_new_format: 1.1 2.1 3.1 4.1 5.1
0.0000092650 seconds
new_format: 1.1 2.1 3.1 4.1 5.1
0.0000055340 seconds
old_format: 1.1 2.1 3.1 4.1 5.1
0.0000052130 seconds
-----
new_new_format: uno 2 3.14 cuatro 5.5
0.0000053380 seconds
new_format: uno 2 3.14 cuatro 5.5
0.0000047570 seconds
old_format: uno 2 3.14 cuatro 5.5
0.0000045320 seconds
-----

相关讨论

严格地说，我们确实离最初的话题越来越远了，但为什么不呢：

当使用getText模块提供本地化的GUI时，旧样式字符串和新样式字符串是唯一的方法；不能在那里使用F字符串。我觉得新款式是这个案子的最佳选择。这里有一个问题。