关于javascript:在什么JS引擎中,toLowerCase和toUpperCase敏感?

In what JS engines, specifically, are toLowerCase & toUpperCase locale-sensitive?

在一些库的代码中(例如AngularJS,链接指向代码中的特定行),我可以看到使用自定义大小写转换函数而不是标准的函数。假设在土耳其语言环境的浏览器中,标准函数不能按预期工作,这是合理的:

1
2
console.log("SCRIPT".toLowerCase()); //"scr?pt"
console.log("script".toUpperCase()); //"SCR?PT"

但这是真的还是曾经的事?浏览器真的是这样吗?如果是,他们中的哪一个做的?那么node.js呢?其他JS引擎?

toLocaleLowerCasetoLocaleUpperCase方法的存在意味着toLowerCasetoUpperCase是区域不变的,不是吗?

具体来说,对于什么浏览器,Angular团队会保留代码:if ('i' !== 'I'.toLowerCase())...

如果您的浏览器(设备)使用土耳其语或阿塞拜疆地区,请运行此代码段,如果您发现问题确实存在,请给我写信。

1
2
3
4
5
6
7
8
if ('i' !== 'I'.toLowerCase()) {
  document.write('Ooops! toLowerCase is locale-sensitive in your browser. ' +
    'Please write your user-agent in the comments to this question: ' +
    navigator.userAgent);
} else {
  document.write('toLowerCase isn\'t locale-sensitive in your browser. ' +
    'Everything works as expected!');
}
1
<html lang="tr">


任何遵循ECMA-262 5.1标准的JS实现都必须实现String.prototype.toLocaleLowerCaseString.prototype.toLocaleUpperCase

并且根据标准toLocaleLowerCase假定,根据特定于区域设置的映射,将字符串转换为它的小写映射。

其中,as toLowerCase转换为Unicode映射定义的小写字符串。

对于大多数语言,toLocaleLowerCasetoLowerCase给出了相同的结果。但是对于某些语言,如土耳其语,大小写映射不遵循Unicode映射,因此toLowerCasetoLocaleLowerCase给出了不同的结果。

您使用的库/框架(jquery、angular、node等)没有任何区别。在JS实现中,您使用它来运行JS库,从而生成和更改内容。

对于所有实际用途,可以准确地得出结论:节点/角度或任何其他JS库和框架在处理字符串时的行为完全相同(只要它们被实现ECMA-2623及更高版本的JS引擎使用)。尽管如此,我确信许多框架都扩展了字符串对象来添加更多的功能,但是ECMA-2625.1定义的基本属性和函数总是存在的,并且行为完全相同。

了解更多信息:http://www.ecma-international.org/ecma-262/5.1/sec-15.5.4.17

就浏览器而言,所有现代浏览器都在其JS引擎中实现ECMA-2625.1标准。我不确定节点,但是从我对节点的有限接触来看,我认为他们也使用了按照ECMA-262 5.1标准实现的JS。


注意:请注意,我不能测试它!

根据ECMAScript规范:

String.prototype.toLowerCase ( )

[...]

For the purposes of this operation, the 16-bit code units of the
Strings are treated as code points in the Unicode Basic Multilingual
Plane. Surrogate code points are directly transferred from S to L
without any mapping.

The result must be derived according to the case mappings in the
Unicode character database (this explicitly includes not only the
UnicodeData.txt file, but also the SpecialCasings.txt file that
accompanies it in Unicode 2.1.8 and later).

[...]

String.prototype.toLocaleLowerCase ( )

This function works exactly the same as toLowerCase except that its
result is intended to yield the correct result for the host
environment’s current locale, rather than a locale-independent result.
There will only be a difference in the few cases (such as Turkish)
where the rules for that language conflict with the regular Unicode
case mappings.

[...]

根据Unicode字符数据库特殊大小写:

[...]

Format

The entries in this file are in the following machine-readable format:

; ; ; (;)? #

无条件映射

[…]

Preserve canonical equivalence for I with dot. Turkic is handled
below.

0130; 0069 0307; 0130; 0130; # LATIN CAPITAL LETTER I WITH DOT ABOVE

[…]

Language-Sensitive Mappings
These are characters whose full case mappings depend on language and perhaps also
context (which characters come before or after). For more information
see the header of this file and the Unicode Standard.

立陶宛人

Lithuanian retains the dot in a lowercase i when followed by accents.

Remove DOT ABOVE after"i" with upper or titlecase

0307; 0307; ; ; lt After_Soft_Dotted; # COMBINING DOT ABOVE

Introduce an explicit dot above when lowercasing capital I's and J's
whenever there are more accents above.
(of the accents used in Lithuanian: grave, acute, tilde above, and ogonek)

0049; 0069 0307; 0049; 0049; lt More_Above; # LATIN CAPITAL LETTER I

004A; 006A 0307; 004A; 004A; lt More_Above; # LATIN CAPITAL LETTER J

012E; 012F 0307; 012E; 012E; lt More_Above; # LATIN CAPITAL LETTER I WITH OGONEK

00CC; 0069 0307 0300; 00CC; 00CC; lt; # LATIN CAPITAL LETTER I WITH GRAVE

00CD; 0069 0307 0301; 00CD; 00CD; lt; # LATIN CAPITAL LETTER I WITH ACUTE

0128; 0069 0307 0303; 0128; 0128; lt; #LATIN CAPITAL LETTER I WITH TILDE

土耳其语和阿塞拜疆语

I and i-dotless; I-dot and i are case pairs in Turkish and Azeri
The following rules handle those cases.

0130; 0069; 0130; 0130; tr; # LATIN CAPITAL LETTER I WITH DOT ABOVE

0130; 0069; 0130; 0130; az; # LATIN CAPITAL LETTER I WITH DOT ABOVE

When lowercasing, remove dot_above in the sequence I + dot_above, which will turn into i.
This matches the behavior of the canonically equivalent I-dot_above

0307; ; 0307; 0307; tr After_I; # COMBINING DOT ABOVE

0307; ; 0307; 0307; az After_I; # COMBINING DOT ABOVE

When lowercasing, unless an I is before a dot_above, it turns into a dotless i.

0049; 0131; 0049; 0049; tr Not_Before_Dot; # LATIN CAPITAL LETTER I

0049; 0131; 0049; 0049; az Not_Before_Dot; # LATIN CAPITAL LETTER I

When uppercasing, i turns into a dotted capital I

0069; 0069; 0130; 0130; tr; # LATIN SMALL LETTER I

0069; 0069; 0130; 0130; az; # LATIN SMALL LETTER I

Note: the following case is already in the UnicodeData.txt file.

0131; 0131; 0049; 0049; tr; # LATIN SMALL LETTER DOTLESS I

EOF

< /块引用>

另外,根据绝对初学者的javascript(作者:terry mcnavage):

1
2
3
4
>"I".toLowerCase() //"i"
>"i".toUpperCase() //"I"
>"I".toLocaleLowerCase() //"<dotless-i>"
>"i".toLocaleUpperCase() //"<dotted-I>"

Note: toLocaleLowerCase() and toLocaleUpperCase() convert case based on your OS settings. You'd have to change those settings to Turkish for the previous sample to work. Or just take my word for it!

根据Bobine关于将javascript字符串转换为小写的评论?问题:

Accept-Language and navigator.language are two completely separate
settings. Accept-Language reflects the user's chosen preferences for
what languages they want to receive in web pages (and this setting is
unfortuately inaccessible to JS). navigator.language merely reflects
which localisation of the web browser was installed, and should
generally not be used for anything. Both of these values are unrelated
to the system locale, which is the bit that decides what
toLocaleLowerCase() will do; that's an OS-level setting out of scope
of the browser's prefs.

因此,将lang="tr-TR"设置为html不会反映真实的测试用例,因为它是一个操作系统设置,需要复制特殊的外壳示例。

我认为在使用toLowerCase()toUpperCase()时,只有小写的dotted-i或大写的dotless-i特定于区域。

根据那些可信/官方的消息来源,我认为你是对的:'i' !== 'I'.toLowerCase()总是认为是错误的。

但是,正如我说的,我不能在这里测试它。