关于编码:MySQL中的utf8mb4和utf8字符集有什么区别?

What is the difference between utf8mb4 and utf8 charsets in MySQL?

mysql中utf8mb4utf8字符集有什么区别?

我已经知道了ascii、utf-8、utf-16和utf-32编码;但是我很想知道utf8mb4组编码与MySQLServer中定义的其他编码类型有什么区别。

使用utf8mb4而不是utf8有什么特别的好处/建议吗?


UTF-8是一种可变长度编码。在UTF-8的情况下,这意味着存储一个代码点需要一到四个字节。但是,MySQL的编码"utf8"(别名"utf8mb3")每个代码点最多只能存储三个字节。

因此,字符集"utf8"/"utf8mb3"不能存储所有的Unicode码位:它只支持范围0x000到0xffff,称为"基本多语言平面"。另请参见Unicode编码的比较。

这就是MySQL文档(上一版本的同一页面)必须要说的:

The character set named utf8[/utf8mb3] uses a maximum of three bytes per character and contains only BMP characters. As of MySQL 5.5.3, the utf8mb4 character set uses a maximum of four bytes per character supports supplemental characters:

  • For a BMP character, utf8[/utf8mb3] and utf8mb4 have identical storage characteristics: same code values, same encoding, same length.

  • For a supplementary character, utf8[/utf8mb3] cannot store the character at all, while utf8mb4 requires four bytes to store it. Since utf8[/utf8mb3] cannot store the character at all, you do not have any supplementary characters in utf8[/utf8mb3] columns and you need not worry about converting characters or losing data when upgrading utf8[/utf8mb3] data from older versions of MySQL.

因此,如果希望您的列支持存储BMP之外的字符(并且您通常希望这样做),例如emoji,请使用"utf8mb4"。另请参见实际使用中最常见的非BMP Unicode字符是什么?.


utf8mb4字符集非常有用,因为现在我们不仅需要支持语言字符,还需要支持符号、新引入的emoji等。

一本关于MathiasBynens如何在MySQL数据库中支持完整Unicode的好书也可以为这方面提供一些帮助。


摘自MySQL8.0参考手册:

  • utf8mb4: A UTF-8 encoding of the Unicode character set using one to
    four bytes per character.

  • utf8mb3: A UTF-8 encoding of the Unicode character set using one to
    three bytes per character.

在mysql中,utf8当前是utf8mb3的别名,该别名已被弃用,并将在未来的mysql版本中删除。此时,utf8将成为utf8mb4的参考。

因此,不管这个别名是什么,您都可以有意识地为自己设置一个utf8mb4编码。