关于utf 8:Unicode和UTF-8有什么区别?

What's the difference between Unicode and UTF-8?

本问题已经有最佳答案,请猛点这里访问。

考虑:

Alt text

unicode=utf16是真的吗?

许多人说Unicode是标准的,不是编码,但大多数编辑器实际上支持另存为Unicode编码。


正如Rasmus在他的文章"UTF-8和Unicode之间的区别"中所说的那样。(链接固定):

If asked the question,"What is the difference between UTF-8 and
Unicode?", would you confidently reply with a short and precise
answer? In these days of internationalization all developers should be
able to do that. I suspect many of us do not understand these concepts
as well as we should. If you feel you belong to this group, you should
read this ultra short introduction to character sets and encodings.

Actually, comparing UTF-8 and Unicode is like comparing apples and
oranges:

UTF-8 is an encoding - Unicode is a character
set

A character set is a list of characters with unique numbers (these
numbers are sometimes referred to as"code points"). For example, in
the Unicode character set, the number for A is 41.

An encoding on the other hand, is an algorithm that translates a
list of numbers to binary so it can be stored on disk. For example
UTF-8 would translate the number sequence 1, 2, 3, 4 like this:

1
00000001 00000010 00000011 00000100

Our data is now translated into binary and can now be saved to
disk.

All together now

Say an application reads the following from the disk:

1
1101000 1100101 1101100 1101100 1101111

The app knows this data represent a Unicode string encoded with
UTF-8 and must show this as text to the user. First step, is to
convert the binary data to numbers. The app uses the UTF-8 algorithm
to decode the data. In this case, the decoder returns this:

1
104 101 108 108 111

Since the app knows this is a Unicode string, it can assume each
number represents a character. We use the Unicode character set to
translate each number to a corresponding character. The resulting
string is"hello".

Conclusion

So when somebody asks you"What is the difference between UTF-8 and
Unicode?", you can now confidently answer short and precise:

UTF-8 (Unicode Transformation Format) and Unicode cannot be compared. UTF-8 is an encoding
used to translate numbers into binary data. Unicode is a character set
used to translate characters into numbers.


most editors support save as ‘Unicode’ encoding actually.

This is an unfortunate misnaming perpetrated Windows模式。P></

因为Windows uses utf - 16le编码格式记忆存储internally as the for Unicode字符串,它"this to be the自然encoding of Unicode文本。在Windows的世界(the strings,there are ANSI代码页on the current机系统,受总unportability there are(Unicode字符串)和存储internally utf - 16le as)。P></

这是在devised of Unicode在早期的天,我们realised UCS-2之前,正与足够的UTF - 8,是在转移过程。为什么Windows for this is s支持UTF-8是全面的贫困。P></

这就misguided naming part of the user接口方案。在Windows的文本编辑器,支持S uses encoding of encodings范围将提供在自动和inappropriately describe as"Unicode的UTF - 16le和UTF - 16be",提供"if,as,Unicode字节顺序的大。"P></

(that other encodings themselves给编辑,notepad + +类,不要have this的问题。)P></

如果它让你感觉更好。ANSI字符串,这是基于ANSI标准是任何T,或者。P></


这是不简单的。P></

UTF-16是16位,可变宽度编码。简单的东西叫"Unicode"ambiguous since is to"是指"unicode",安全部集character encoding of Standards for。Unicode encoding is not an!P></

http:/ / / /百科en.wikipedia.org Unicode Unicode转换格式# _ _ _和_ _ character _通用集P></

of the obligatory乔尔和在线课程,每个软件软件——绝对最小绝对developer positively必备知识,关于Unicode和字符集合(不excuses!)链接。P></


这里有很多误解。Unicode不是一种编码,但Unicode标准主要用于编码。

ISO 10646是您(可能)关心的国际字符集。它定义了一组命名字符(例如,"拉丁大写字母A"或"希腊小写字母Alpha")和一组代码点(分配给每个代码点的一个数字,例如,这两个代码点分别为61个十六进制和3b1个十六进制;对于Unicode代码点,标准符号为U+0061和U+03b1)。

曾经,Unicode定义了自己的字符集,或多或少是与ISO10646的竞争对手。这是一个16位字符集,但它不是UTF-16;它被称为UCS-2。它包括了一种颇具争议的技巧,试图将必要字符的数量保持在最低限度(汉字统一——基本上是将中文、日文和韩文字符视为相同的字符)。

从那时起,unicode联合体就默认这是行不通的,现在主要集中在对iso 10646字符集进行编码的方法上。主要的方法是utf-8、utf-16和ucs-4(即utf-32)。那些(除了UTF-8)也有le(小endian)和be(大endian)变体。

就其本身而言,"unicode"几乎可以指上述任何一种(尽管我们可能会消除它显式显示的其他内容,如utf-8)。"unicode"的不合格使用可能最常发生在Windows上,在Windows中,它几乎肯定会引用utf-16。早期版本的WindowsNT在ucs-2是当前版本时采用Unicode。当ucs-2被宣布过时后(如果内存可用,在win2k附近),他们切换到与ucs-2最相似的utf-16(事实上,"基本多语言平面"中的字符是相同的,它覆盖了很多,包括大多数西欧语言的所有字符)。


UTF-16和UTF-8都是Unicode编码。它们都是Unicode;一个不比另一个更Unicode。

不要让来自微软的不幸的历史文物迷惑了你。


The development of Unicode was aimed
at creating a new standard for mapping
the characters in a great majority of
languages that are being used today,
along with other characters that are
not that essential but might be
necessary for creating the text. UTF-8
is only one of the many ways that you
can encode the files because there are
many ways you can encode the
characters inside a file into Unicode.

源码:P></

http:/ / / /差分技术www.differencebetween.net之间- Unicode和UTF-8 /P></


在美国之外的trufa' Comment,explicitly不是Unicode UTF-16。当他们看起来是在第一speculated,Unicode,它可能不够好的16位整数的队列在任何商店,but that not to be out实践批判的茶馆。不管一个人多,UTF - 16 is another alongside the valid encoding of Unicode - 8 -位和32位variants我相信that is the encoding使用存储在运行在微软的NT操作系统上的源。P></


让我们从记住数据存储为字节开始;Unicode是一个字符集,其中字符映射到代码点(唯一整数),我们需要一些东西将这些代码点数据转换为字节。这就是UTF-8采用所谓编码的地方——简单!


这是特德。is not an Unicode标准编码。as is possible to the端specify恩我想这也许effectively utf - 16或32。P></

在提供菜单does this from?P></