关于python：使用Pandas读取CSV时设置列类型

Setting column types while reading csv with pandas

尝试使用以下格式将csv文件读入pandas数据框

1
2
3
4
5
6
7

dp = pd.read_csv('products.csv', header = 0, dtype = {'name': str,'review': str,
'rating': int,'word_count': dict}, engine = 'c')
print dp.shape
for col in dp.columns:
print 'column', col,':', type(col[0])
print type(dp['rating'][0])
dp.head(3)

这是输出：

1
2
3
4
5
6

(183531, 4)
column name : <type 'str'>
column review : <type 'str'>
column rating : <type 'str'>
column word_count : <type 'str'>
<type 'numpy.int64'>

enter image description here

我可以理解，鉴于此，熊猫可能会发现很难将字典的字符串表示形式转换成字典。但是，" rating"列的内容如何既是str又是numpy.int64？

顺便说一句，未指定引擎或标头之类的调整不会更改任何内容。

谢谢并恭祝安康

采用：

dp.info()

查看列的数据类型。 dp.columns引用列标题名称，它们是字符串。

相关讨论

做就是了：

1 2	for col in dp.columns: print 'column', col,':', col[0]

您将看到您打印每个列名称的首字母，即字符串。请注意，这里您要按列名而不是每个系列进行迭代。

您想要的是通过循环检查每一列的类型，而不是这样做：

1 2	for col in dp.columns: print 'column', col,':', type(dp[col][0])

...就像您对栏目评分一样！！