关于字符串:python中的str.isdigit,isnumeric和isdecimal有什么区别?

What's the difference between str.isdigit, isnumeric and isdecimal in python?

当我运行这些方法时

1
2
3
s.isdigit()
s.isnumeric()
s.isdecimal()

对于s的每个值(当然是字符串),我总是得到输出,或者全部为True,或者全部为False。
什么是? 两者之间的区别? 您能否提供一个给出两个对错一个(反之亦然)的示例?


主要是关于unicode分类。以下是一些显示差异的示例:

1
2
3
4
5
6
7
8
9
10
11
12
>>> def spam(s):
...     for attr in 'isnumeric', 'isdecimal', 'isdigit':
...         print(attr, getattr(s, attr)())
...        
>>> spam('?')
isnumeric True
isdecimal False
isdigit False
>>> spam('3')
isnumeric True
isdecimal False
isdigit True

具体行为在此处的官方文档中。

查找所有脚本:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
import sys
import unicodedata
from collections import defaultdict

d = defaultdict(list)
for i in range(sys.maxunicode + 1):
    s = chr(i)
    t = s.isnumeric(), s.isdecimal(), s.isdigit()
    if len(set(t)) == 2:
        try:
            name = unicodedata.name(s)
        except ValueError:
            name = f'codepoint{i}'
        print(s, name)
        d[t].append(s)


Python文档记录了这三种方法之间的区别。

str.isdigit

Return true if all characters in the string are digits and there is at least one character, false otherwise. Digits include decimal characters and digits that need special handling, such as the compatibility superscript digits. This covers digits which cannot be used to form numbers in base 10, like the Kharosthi numbers. Formally, a digit is a character that has the property value Numeric_Type=Digit or Numeric_Type=Decimal.

str.isnumeric

Return true if all characters in the string are numeric characters, and there is at least one character, false otherwise. Numeric characters include digit characters, and all characters that have the Unicode numeric value property, e.g. U+2155, VULGAR FRACTION ONE FIFTH. Formally, numeric characters are those with the property value Numeric_Type=Digit, Numeric_Type=Decimal or Numeric_Type=Numeric.

str.isdecimal

Return true if all characters in the string are decimal characters and there is at least one character, false otherwise. Decimal characters are those that can be used to form numbers in base 10, e.g. U+0660, ARABIC-INDIC DIGIT ZERO. Formally a decimal character is a character in the Unicode General Category"Nd".

就像@Wim所说的,这三种方法之间的主要区别是它们处理特定Unicode字符的方式。


根据定义,isdecimal()isdigit()isnumeric()。也就是说,如果字符串是decimal,则它也将是digitnumeric

因此,给定字符串s并使用这三种方法对其进行测试,结果将只有4种。

1
2
3
4
5
6
7
8
+-------------+-----------+-------------+----------------------------------+
| isdecimal() | isdigit() | isnumeric() |          Example                 |
+-------------+-----------+-------------+----------------------------------+
|    True     |    True   |    True     |"038","???","038"           |
|  False      |    True   |    True     |"?3?","??⒊⒏","?③⑧"          |
|  False      |  False    |    True     |"???","ⅠⅢⅧ","⑩??","壹貳參"  |
|  False      |  False    |  False      |"abc","38.0","-38"             |
+-------------+-----------+-------------+----------------------------------+

1.字符isdecimal()==True的一些示例

(因此isdigit()==Trueisnumeric()==True)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
"0123456789"  DIGIT ZERO~NINE
"??????????"  ARABIC-INDIC DIGIT ZERO~NINE
"??????????"  DEVANAGARI DIGIT ZERO~NINE
"??????????"  BENGALI DIGIT ZERO~NINE
"??????????"  GURMUKHI DIGIT ZERO~NINE
"??????????"  GUJARATI DIGIT ZERO~NINE
"??????????"  ORIYA DIGIT ZERO~NINE
"??????????"  TAMIL DIGIT ZERO~NINE
"??????????"  TELUGU DIGIT ZERO~NINE
"??????????"  KANNADA DIGIT ZERO~NINE
"??????????"  MALAYALAM DIGIT ZERO~NINE
"??????????"  THAI DIGIT ZERO~NINE
"??????????"  LAO DIGIT ZERO~NINE
"??????????"  TIBETAN DIGIT ZERO~NINE
"??????????"  MYANMAR DIGIT ZERO~NINE
"??????????"  KHMER DIGIT ZERO~NINE
"0123456789"  FULLWIDTH DIGIT ZERO~NINE
"????????????????????"  MATHEMATICAL BOLD DIGIT ZERO~NINE
"????????????????????"  MATHEMATICAL DOUBLE-STRUCK DIGIT ZERO~NINE
"????????????????????"  MATHEMATICAL SANS-SERIF DIGIT ZERO~NINE
"????????????????????"  MATHEMATICAL SANS-SERIF BOLD DIGIT ZERO~NINE
"????????????????????"  MATHEMATICAL MONOSPACE DIGIT ZERO~NINE

2.字符isdecimal()==Falseisdigit()==True的一些示例

(因此isnumeric()==True)

1
2
3
4
5
6
7
8
9
10
11
"?123??????"  SUPERSCRIPT ZERO~NINE
"??????????"  SUBSCRIPT ZERO~NINE
"??⒈⒉⒊⒋⒌⒍⒎⒏⒐"  DIGIT ZERO~NINE FULL STOP
"????????????????????"  DIGIT ZERO~NINE COMMA
"?①②③④⑤⑥⑦⑧⑨"  CIRCLED DIGIT ZERO~NINE
"??????????"  NEGATIVE CIRCLED DIGIT ZERO~NINE
"⑴⑵⑶⑷⑸⑹⑺⑻⑼"  PARENTHESIZED DIGIT ONE~NINE
"?????????"  DINGBAT CIRCLED SANS-SERIF DIGIT ONE~NINE
"?????????"  DOUBLE CIRCLED DIGIT ONE~NINE
"?????????"  DINGBAT NEGATIVE CIRCLED SANS-SERIF DIGIT ONE~NINE
"?????????"  ETHIOPIC DIGIT ONE~NINE

3.字符isdecimal()==Falseisdigit()==Falseisnumeric()==True的一些示例

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
"????????????????????"  VULGAR FRACTION
"??????"  BENGALI CURRENCY NUMERATOR
"???"  TAMIL NUMBER TEN, ONE HUNDRED, ONE THOUSAND
"???????"  TELUGU FRACTION DIGIT
"??????"  MALAYALAM NUMBER, MALAYALAM FRACTION
"??????????"  TIBETAN DIGIT HALF ZERO~NINE
"???????????"  ETHIOPIC NUMBER TEN~NINETY, HUNDRED, TEN THOUSAND
"??????????"  KHMER SYMBOL LEK ATTAK
"ⅠⅡⅢⅣⅤⅥⅦⅧⅨⅩⅪⅫ????"  ROMAN NUMERAL
"ⅰⅱⅲⅳⅴⅵⅶⅷⅸⅹ??????"  SMALL ROMAN NUMERAL
"?????"  ROMAN NUMERAL
"⑩????????????????????????????????????????"  CIRCLED NUMBER TEN~FIFTY
"????????"  CIRCLED NUMBER TEN~EIGHTY ON BLACK SQUARE
"⑽⑾⑿⒀⒁⒂⒃⒄⒅⒆⒇"  PARENTHESIZED NUMBER TEN~TWENTY
"⒑⒒⒓⒔⒕⒖⒗⒘⒙⒚⒛"  NUMBER TEN~TWENTY FULL STOP
"??????????"  NEGATIVE CIRCLED NUMBER ELEVEN
"????"  various styles of CIRCLED NUMBER TEN
"??"  DINGBAT NEGATIVE CIRCLED SANS-SERIF DIGIT ZERO
"〇"  IDEOGRAPHIC NUMBER ZERO
"〡〢〣〤〥〦〧〨〩???"  HANGZHOU NUMERAL ONE~TEN, TWENTY, THIRTY
"一二三四"  IDEOGRAPHIC ANNOTATION ONE~FOUR MARK
"㈠㈡㈢㈣㈤㈥㈦㈧㈨㈩"  PARENTHESIZED IDEOGRAPH ONE~TEN
"一二三四五六七八九十"  CIRCLED IDEOGRAPH ONE~TEN
"一二三四五六七八九十壹貳參肆伍陸柒捌玖拾零百千萬億兆弐貮贰??漆什?陌阡佰仟万亿幺兩?亖卄卅卌廾廿"  CJK UNIFIED IDEOGRAPH
"參拾兩零六陸什"  CJK COMPATIBILITY IDEOGRAPH
"????????????????????????????????????"  AEGEAN NUMBER ONE~NINE, TEN~NINETY
"????????????????????????????????????"  AEGEAN NUMBER ONE~NINE HUNDRED, ONE~NINE THOUSAND
"????????????????"  AEGEAN NUMBER TEN~NINETY THOUSAND
"??????????????????????????????????????????????????????????????????????????????????????????????????????"  GREEK ACROPHONIC ATTIC
"??????????????????"  COUNTING ROD UNIT DIGIT ONE~NINE
"??????????????????"  COUNTING ROD TENS DIGIT ONE~NINE

对于这三个变量,负数a ="-10"都将为false

1
a.isdecimal(), a.isdigit(), a.isnumeric()


错误,错误,错误
isdecimal()在任何语言中都只有0到9,但没有负号
isdigit()在任何语言中都将只有0到9,并且在"至幂"位置也是如此。 (幂的小数,例如:2到5的幂)。
isnumeric()的范围更广。.在任何位置都将包含大于0到9,但是在任何语言中,ex。也将包含成百上千。罗马10是X,其有效的isnumeric()。
但是这三个都不是对的:
负数,例如:-10
和浮点数,例如:10.1