Encoding troubles with python, mysql and utf8mb4
尝试将简单数据框保存到mysql时收到以下警告:
C:...\anaconda3\lib\site-packages\pymysql\cursors.py:170: Warning: (1366,"Incorrect string value: '\x92\xE9t\xE9)' for column 'VARIABLE_VALUE' at row 518")
result = self._query(query)
和
C:...anaconda3\lib\site-packages\pymysql\cursors.py:170: Warning:
(3719,"'utf8' is currently an alias for the character set UTF8MB3,
but will be an alias for UTF8MB4 in a future release. Please consider
using UTF8MB4 in order to be unambiguous.") result =
self._query(query)
环境信息:我使用Mysql8,python3.6(pymysql 0.9.2,sqlalchemy 1.2.1)
我访问了类似一个链接的波纹管的帖子,但似乎都没有提供有关如何避免此警告的解决方案。
- 将unicode字符串保存在Django中时,MySQL"字符串值不正确"错误->指示要使用UTF8
N.B:mysql中表的排序规则似乎未设置为我在
可执行代码:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | import DataEngine.db.Connection as connection import random import pandas as pd if __name__ =="__main__": conn = connection.Connection(host="host_name", port="3306", user="username", password="password") conn.create_db("raw_data") conn.establish("raw_data") l1 = [] for i in range(10): l_nested = [] for j in range(10): l_nested.append(random.randint(0, 100)) l1.append(l_nested) df = pd.DataFrame(l1) conn.save(df,"random_df") df2 = conn.retrieve("random_df") print(df2) |
因此,数据库中保留的数据框为:
1 2 3 4 5 6 7 8 9 10 11 | index 0 1 2 3 4 5 6 7 8 9 0 0 11 57 75 45 81 70 91 66 93 96 1 1 51 43 3 64 2 6 93 5 49 40 2 2 35 80 76 11 23 87 19 32 13 98 3 3 82 10 69 40 34 66 42 24 82 59 4 4 49 74 39 61 14 63 94 92 82 85 5 5 50 47 90 75 48 77 17 43 5 29 6 6 70 40 78 60 29 48 52 48 39 36 7 7 21 87 41 53 95 3 31 67 50 30 8 8 72 79 73 82 20 15 51 14 38 42 9 9 68 71 11 17 48 68 17 42 83 95 |
我的
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 | import sqlalchemy import pymysql import pandas as pd class Connection: def __init__(self: object, host: str, port: str, user: str, password: str): self.host = host self.port = port self.user = user self.password = password self.conn = None def create_db(self: object, db_name: str, charset: str ="utf8mb4", collate:str ="utf8mb4_unicode_ci",drop_if_exists: bool = True): c = pymysql.connect(host=self.host, user=self.user, password=self.password) if drop_if_exists: c.cursor().execute("DROP DATABASE IF EXISTS" + db_name) c.cursor().execute("CREATE DATABASE" + db_name +" CHARACTER SET=" + charset +" COLLATE=" + collate) c.close() print("Database %s created with a %s charset" % (db_name, charset)) def establish(self: object, db_name: str, charset: str ="utf8mb4"): self.conn = sqlalchemy.create_engine( "mysql+pymysql://" + self.user +":" + self.password +"@" + self.host +":" + self.port +"/" + db_name + "?charset=" + charset) print("Connection with database : %s has been established as %s at %s." % (db_name, self.user, self.host)) print("Charset : %s" % charset) def retrieve(self, table): df = pd.read_sql_table(table, self.conn) return df def save(self: object, df:"Pandas.DataFrame", table: str, if_exists: str ="replace", chunksize: int = 10000): df.to_sql(name=table, con=self.conn, if_exists=if_exists, chunksize=chunksize) |
一些可能有帮助的元素:

好吧,十六进制92和e9无效的utf8mb4(UTF-8)。 假设您使用的是
找出该文本来自何处,然后决定它是否为有效的