关于python：如果值匹配，则自动比较2个csv文件的值的过程将第二个csv读入DataFrame

csvdataframepandaspython

Automate the process of comparing the values of 2 csv files if value matches read the second csv into the DataFrame

我已将Excel导入数据框。看起来像这样：

enter

1 2	for i, g in dframe.groupby('tx_id'): g.to_csv('{}.csv'.format(i.split('/')[0]), index=False)

然后我创建了一个仅包含tx_id的单独dframe，然后使用以下代码删除了重复项：

1	dframe1 = dframe1.drop_duplicates()

现在我的数据框如下所示：

enter

1	df = pd.read_csv(' ae229a81-bb33-4cf1-ba2f-360fffb0d94b.csv')

这给了我这样的结果：

enter

1	df1 = df.groupby('rule_id')['request_id'].value_counts().unstack().fillna(0)

最终的结果看起来像这样：

enter

1
2
3
4
5

import pandas as pd

dfs = []
for tx in dframe1['tx_id']:
dfs.append(pd.read_csv('%s.csv' % tx))

仅当它在与csv文件相同的目录中执行时才有效。否则：

1
2
3
4
5
6
7

import os
import pandas

dfs = []

for tx in dframe1['tx_id']:
dfs.append(pd.read_csv(os.path.join('/path/to/csv/', '%s.csv' % tx)))

已编辑

如果要应用某些功能，而不是直接附加数据框：

1
2
3

for tx in dframe1['tx_id']:
df = pd.read_csv(os.path.join('/path/to/csv/', '%s.csv' % tx))
dfs.append(df.groupby('rule_id')['request_id'].value_counts().unstack().fillna(0))

现在您的dfs具有所有value_counts()结果。您可以使用索引来引用它们。

如果要使用文件名查找它们，请使用dict：

1
2
3
4

df_dict = dict()
for tx in dframe1['tx_id']:
df = pd.read_csv(os.path.join('/path/to/csv/', '%s.csv' % tx))
df_dict[tx] = df.groupby('rule_id')['request_id'].value_counts().unstack().fillna(0)