How can I convert JSON to CSV?
我有一个JSON文件,我想要转换为CSV文件。 如何使用Python执行此操作?
我试过了:
1 2 3 4 5 6 7 8 9 10 11 12 | import json import csv f = open('data.json') data = json.load(f) f.close() f = open('data.csv') csv_file = csv.writer(f) for item in data: f.writerow(item) f.close() |
但是,它没有用。 我正在使用Django,我收到的错误是:
1 | file' object has no attribute 'writerow' |
因此,我尝试了以下方法:
1 2 3 4 5 6 7 8 9 10 11 12 13 | import json import csv f = open('data.json') data = json.load(f) f.close() f = open('data.csv') csv_file = csv.writer(f) for item in data: csv_file.writerow(item) f.close() |
然后我得到错误:
1 | sequence expected |
示例json文件:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 | [ { "pk": 22, "model":"auth.permission", "fields": { "codename":"add_logentry", "name":"Can add log entry", "content_type": 8 } }, { "pk": 23, "model":"auth.permission", "fields": { "codename":"change_logentry", "name":"Can change log entry", "content_type": 8 } }, { "pk": 24, "model":"auth.permission", "fields": { "codename":"delete_logentry", "name":"Can delete log entry", "content_type": 8 } }, { "pk": 4, "model":"auth.permission", "fields": { "codename":"add_group", "name":"Can add group", "content_type": 2 } }, { "pk": 10, "model":"auth.permission", "fields": { "codename":"add_message", "name":"Can add message", "content_type": 4 } } ] |
我不确定这个问题是否已经解决,但是让我粘贴我所做的以供参考。
首先,您的JSON具有嵌套对象,因此通常无法直接转换为CSV。
您需要将其更改为以下内容:
1 2 3 4 5 6 7 8 | { "pk": 22, "model":"auth.permission", "codename":"add_logentry", "content_type": 8, "name":"Can add log entry" }, ......] |
这是我从中生成CSV的代码:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 | import csv import json x ="""[ { "pk": 22, "model":"auth.permission", "fields": { "codename":"add_logentry", "name":"Can add log entry", "content_type": 8 } }, { "pk": 23, "model":"auth.permission", "fields": { "codename":"change_logentry", "name":"Can change log entry", "content_type": 8 } }, { "pk": 24, "model":"auth.permission", "fields": { "codename":"delete_logentry", "name":"Can delete log entry", "content_type": 8 } } ]""" x = json.loads(x) f = csv.writer(open("test.csv","wb+")) # Write CSV Header, If you dont need that, remove this line f.writerow(["pk","model","codename","name","content_type"]) for x in x: f.writerow([x["pk"], x["model"], x["fields"]["codename"], x["fields"]["name"], x["fields"]["content_type"]]) |
您将获得输出:
1 2 3 4 | pk,model,codename,name,content_type 22,auth.permission,add_logentry,Can add log entry,8 23,auth.permission,change_logentry,Can change log entry,8 24,auth.permission,delete_logentry,Can delete log entry,8 |
我假设您的JSON文件将解码为词典列表。首先,我们需要一个能够展平JSON对象的函数:
1 2 3 4 5 6 7 8 9 10 11 | def flattenjson( b, delim ): val = {} for i in b.keys(): if isinstance( b[i], dict ): get = flattenjson( b[i], delim ) for j in get.keys(): val[ i + delim + j ] = get[j] else: val[i] = b[i] return val |
在JSON对象上运行此代码段的结果:
1 2 3 4 5 6 7 8 9 | flattenjson( { "pk": 22, "model":"auth.permission", "fields": { "codename":"add_message", "name":"Can add message", "content_type": 8 } },"__" ) |
是
1 2 3 4 5 6 7 | { "pk": 22, "model":"auth.permission', "fields__codename":"add_message", "fields__name":"Can add message", "fields__content_type": 8 } |
将此函数应用于JSON对象的输入数组中的每个dict后:
1 | input = map( lambda x: flattenjson( x,"__" ), input ) |
并找到相关的列名:
1 2 | columns = [ x for row in input for x in row.keys() ] columns = list( set( columns ) ) |
通过csv模块运行它并不困难:
1 2 3 4 5 6 | with open( fname, 'wb' ) as out_file: csv_w = csv.writer( out_file ) csv_w.writerow( columns ) for i_r in input: csv_w.writerow( map( lambda x: i_r.get( x,"" ), columns ) ) |
我希望这有帮助!
使用
1 | pandas.read_json() |
要将JSON字符串转换为pandas对象(序列或数据框)。然后,假设结果存储为
1 | df.to_csv() |
哪个可以返回字符串或直接写入csv文件。
基于先前答案的冗长性,我们都应该感谢熊猫的捷径。
JSON可以表示各种各样的数据结构 - JS"对象"大致类似于Python dict(带字符串键),JS"数组"大致类似于Python列表,只要最后一个就可以嵌套它们"叶"元素是数字或字符串。
CSV本质上只能表示一个二维表 - 可选地带有第一行"标题",即"列名",它可以使表可解释为一个字典列表,而不是正常的解释,列表列表(同样,"叶"元素可以是数字或字符串)。
因此,在一般情况下,您无法将任意JSON结构转换为CSV。在一些特殊情况下,您可以(没有进一步嵌套的数组数组;所有具有完全相同键的对象数组)。哪种特殊情况(如果有的话)适用于您的问题?解决方案的详细信息取决于您的特殊情况。鉴于您甚至没有提到哪一个适用的惊人事实,我怀疑您可能没有考虑过约束,事实上既不适用也不适用,而您的问题无法解决。但是请澄清!
一种通用解决方案,可将任何平面对象的json列表转换为csv。
在命令行上将input.json文件作为第一个参数传递。
1 2 3 4 5 6 7 8 9 10 11 12 | import csv, json, sys input = open(sys.argv[1]) data = json.load(input) input.close() output = csv.writer(sys.stdout) output.writerow(data[0].keys()) # header row for row in data: output.writerow(row.values()) |
假设您的JSON数据位于名为
1 2 3 4 5 6 7 8 9 10 11 | import json import csv with open("data.json") as file: data = json.load(file) with open("data.csv","w") as file: csv_file = csv.writer(file) for item in data: fields = list(item['fields'].values()) csv_file.writerow([item['pk'], item['model']] + fields) |
1 2 3 4 5 6 7 8 9 10 | def read_json(filename): return json.loads(open(filename).read()) def write_csv(data,filename): with open(filename, 'w+') as outf: writer = csv.DictWriter(outf, data[0].keys()) writer.writeheader() for row in data: writer.writerow(row) # implement write_csv(read_json('test.json'), 'output.csv') |
请注意,这假定您的所有JSON对象都具有相同的字段。
这是可以帮助您的参考。
我在Dan提出的解决方案上遇到了麻烦,但这对我有用:
1 2 3 4 5 6 7 8 9 10 11 | import json import csv f = open('test.json') data = json.load(f) f.close() f=csv.writer(open('test.csv','wb+')) for item in data: f.writerow([item['pk'], item['model']] + item['fields'].values()) |
"test.json"包含以下内容:
1 2 3 4 5 6 7 | [ {"pk": 22,"model":"auth.permission","fields": {"codename":"add_logentry","name":"Can add log entry","content_type": 8 } }, {"pk": 23,"model":"auth.permission","fields": {"codename":"change_logentry","name":"Can change log entry","content_type": 8 } }, {"pk": 24,"model":"auth.permission","fields": {"codename":"delete_logentry","name":"Can delete log entry","content_type": 8 } } ] |
如前面的答案所述,将json转换为csv的难度是因为json文件可以包含嵌套字典,因此是一个多维数据结构,而不是csv,它是一个2D数据结构。但是,将多维结构转换为csv的好方法是将多个csv与主键绑定在一起。
在您的示例中,第一个csv输出的列为"pk","model","fields"。"pk"和"model"的值很容易获得,但因为"fields"列包含一个字典,它应该是它自己的csv,因为"codename"看起来是主键,你可以用作输入为"领域"完成第一个csv。第二个csv包含来自"fields"列的字典,其中代号为主键,可用于将2个csv绑定在一起。
这是一个json文件的解决方案,它将嵌套的字典转换为2个csvs。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 | import csv import json def readAndWrite(inputFileName, primaryKey=""): input = open(inputFileName+".json") data = json.load(input) input.close() header = set() if primaryKey !="": outputFileName = inputFileName+"-"+primaryKey if inputFileName =="data": for i in data: for j in i["fields"].keys(): if j not in header: header.add(j) else: outputFileName = inputFileName for i in data: for j in i.keys(): if j not in header: header.add(j) with open(outputFileName+".csv", 'wb') as output_file: fieldnames = list(header) writer = csv.DictWriter(output_file, fieldnames, delimiter=',', quotechar='"') writer.writeheader() for x in data: row_value = {} if primaryKey =="": for y in x.keys(): yValue = x.get(y) if type(yValue) == int or type(yValue) == bool or type(yValue) == float or type(yValue) == list: row_value[y] = str(yValue).encode('utf8') elif type(yValue) != dict: row_value[y] = yValue.encode('utf8') else: if inputFileName =="data": row_value[y] = yValue["codename"].encode('utf8') readAndWrite(inputFileName, primaryKey="codename") writer.writerow(row_value) elif primaryKey =="codename": for y in x["fields"].keys(): yValue = x["fields"].get(y) if type(yValue) == int or type(yValue) == bool or type(yValue) == float or type(yValue) == list: row_value[y] = str(yValue).encode('utf8') elif type(yValue) != dict: row_value[y] = yValue.encode('utf8') writer.writerow(row_value) readAndWrite("data") |
我知道已经有很长一段时间了,因为这个问题已被提出,但我想我可能会添加其他人的答案,并分享一篇博文,我认为我会以非常简洁的方式解释解决方案。
链接在这里
打开文件进行写作
1 | employ_data = open('/tmp/EmployData.csv', 'w') |
创建csv编写器对象
1 2 3 4 5 6 7 8 | csvwriter = csv.writer(employ_data) count = 0 for emp in emp_data: if count == 0: header = emp.keys() csvwriter.writerow(header) count += 1 csvwriter.writerow(emp.values()) |
确保关闭文件以保存内容
1 | employ_data.close() |
这相当不错。
它会使json变平,将其写入csv文件。
管理嵌套元素:)
那是python 3
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 | import json o = json.loads('your json string') # Be careful, o must be a list, each of its objects will make a line of the csv. def flatten(o, k='/'): global l, c_line if isinstance(o, dict): for key, value in o.items(): flatten(value, k + '/' + key) elif isinstance(o, list): for ov in o: flatten(ov, '') elif isinstance(o, str): o = o.replace(' ',' ').replace(' ',' ').replace(';', ',') if not k in l: l[k]={} l[k][c_line]=o def render_csv(l): ftime = True for i in range(100): #len(l[list(l.keys())[0]]) for k in l: if ftime : print('%s;' % k, end='') continue v = l[k] try: print('%s;' % v[i], end='') except: print(';', end='') print() ftime = False i = 0 def json_to_csv(object_list): global l, c_line l = {} c_line = 0 for ov in object_list : # Assumes json is a list of objects flatten(ov) c_line += 1 render_csv(l) json_to_csv(o) |
请享用。
我解决这个问题的简单方法:
创建一个新的Python文件,如:json_to_csv.py
添加此代码:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | import csv, json, sys #if you are not using utf-8 files, remove the next line sys.setdefaultencoding("UTF-8") #check if you pass the input file and output file if sys.argv[1] is not None and sys.argv[2] is not None: fileInput = sys.argv[1] fileOutput = sys.argv[2] inputFile = open(fileInput) outputFile = open(fileOutput, 'w') data = json.load(inputFile) inputFile.close() output = csv.writer(outputFile) output.writerow(data[0].keys()) # header row for row in data: output.writerow(row.values()) |
添加此代码后,保存文件并在终端上运行:
python json_to_csv.py input.txt output.csv
我希望这对你有帮助。
再见!
这不是一个非常聪明的方法,但我遇到了同样的问题,这对我有用:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 | import csv f = open('data.json') data = json.load(f) f.close() new_data = [] for i in data: flat = {} names = i.keys() for n in names: try: if len(i[n].keys()) > 0: for ii in i[n].keys(): flat[n+"_"+ii] = i[n][ii] except: flat[n] = i[n] new_data.append(flat) f = open(filename,"r") writer = csv.DictWriter(f, new_data[0].keys()) writer.writeheader() for row in new_data: writer.writerow(row) f.close() |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 | import json,csv t='' t=(type('a')) json_data = [] data = None write_header = True item_keys = [] try: with open('kk.json') as json_file: json_data = json_file.read() data = json.loads(json_data) except Exception as e: print( e) with open('bar.csv', 'at') as csv_file: writer = csv.writer(csv_file)#, quoting=csv.QUOTE_MINIMAL) for item in data: item_values = [] for key in item: if write_header: item_keys.append(key) value = item.get(key, '') if (type(value)==t): item_values.append(value.encode('utf-8')) else: item_values.append(value) if write_header: writer.writerow(item_keys) write_header = False writer.writerow(item_values) |
尝试这个
1 2 3 4 5 6 7 8 9 10 11 12 | import csv, json, sys input = open(sys.argv[1]) data = json.load(input) input.close() output = csv.writer(sys.stdout) output.writerow(data[0].keys()) # header row for item in data: output.writerow(item.values()) |
此代码适用于任何给定的json文件
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 | # -*- coding: utf-8 -*- """ Created on Mon Jun 17 20:35:35 2019 author: Ram """ import json import csv with open("file1.json") as file: data = json.load(file) # create the csv writer object pt_data1 = open('pt_data1.csv', 'w') csvwriter = csv.writer(pt_data1) count = 0 for pt in data: if count == 0: header = pt.keys() csvwriter.writerow(header) count += 1 csvwriter.writerow(pt.values()) pt_data1.close() |
亚力克的答案很棒,但在有多层嵌套的情况下它不起作用。这是一个支持多个嵌套级别的修改版本。如果嵌套对象已经指定了自己的密钥(例如Firebase Analytics / BigTable / BigQuery数据),它还会使标题名称更好一些:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 | """Converts JSON with nested fields into a flattened CSV file. """ import sys import json import csv import os import jsonlines from orderedset import OrderedSet # from https://stackoverflow.com/a/28246154/473201 def flattenjson( b, prefix='', delim='/', val=None ): if val == None: val = {} if isinstance( b, dict ): for j in b.keys(): flattenjson(b[j], prefix + delim + j, delim, val) elif isinstance( b, list ): get = b for j in range(len(get)): key = str(j) # If the nested data contains its own key, use that as the header instead. if isinstance( get[j], dict ): if 'key' in get[j]: key = get[j]['key'] flattenjson(get[j], prefix + delim + key, delim, val) else: val[prefix] = b return val def main(argv): if len(argv) < 2: raise Error('Please specify a JSON file to parse') filename = argv[1] allRows = [] fieldnames = OrderedSet() with jsonlines.open(filename) as reader: for obj in reader: #print obj flattened = flattenjson(obj) #print 'keys: %s' % flattened.keys() fieldnames.update(flattened.keys()) allRows.append(flattened) outfilename = filename + '.csv' with open(outfilename, 'w') as file: csvwriter = csv.DictWriter(file, fieldnames=fieldnames) csvwriter.writeheader() for obj in allRows: csvwriter.writerow(obj) if __name__ == '__main__': main(sys.argv) |
修改了Alec McGail的答案,用里面的列表支持JSON
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | def flattenjson(self, mp, delim="|"): ret = [] if isinstance(mp, dict): for k in mp.keys(): csvs = self.flattenjson(mp[k], delim) for csv in csvs: ret.append(k + delim + csv) elif isinstance(mp, list): for k in mp: csvs = self.flattenjson(k, delim) for csv in csvs: ret.append(csv) else: ret.append(mp) return ret |
谢谢!
我可能迟到了,但我想,我已经解决了类似的问题。我有一个看起来像这样的json文件
我只想从这些json文件中提取几个键/值。因此,我编写了以下代码以提取相同的代码。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 | """json_to_csv.py This script reads n numbers of json files present in a folder and then extract certain data from each file and write in a csv file. The folder contains the python script i.e. json_to_csv.py, output.csv and another folder descriptions containing all the json files. """ import os import json import csv def get_list_of_json_files(): """Returns the list of filenames of all the Json files present in the folder Parameter --------- directory : str 'descriptions' in this case Returns ------- list_of_files: list List of the filenames of all the json files """ list_of_files = os.listdir('descriptions') # creates list of all the files in the folder return list_of_files def create_list_from_json(jsonfile): """Returns a list of the extracted items from json file in the same order we need it. Parameter _________ jsonfile : json The json file containing the data Returns ------- one_sample_list : list The list of the extracted items needed for the final csv """ with open(jsonfile) as f: data = json.load(f) data_list = [] # create an empty list # append the items to the list in the same order. data_list.append(data['_id']) data_list.append(data['_modelType']) data_list.append(data['creator']['_id']) data_list.append(data['creator']['name']) data_list.append(data['dataset']['_accessLevel']) data_list.append(data['dataset']['_id']) data_list.append(data['dataset']['description']) data_list.append(data['dataset']['name']) data_list.append(data['meta']['acquisition']['image_type']) data_list.append(data['meta']['acquisition']['pixelsX']) data_list.append(data['meta']['acquisition']['pixelsY']) data_list.append(data['meta']['clinical']['age_approx']) data_list.append(data['meta']['clinical']['benign_malignant']) data_list.append(data['meta']['clinical']['diagnosis']) data_list.append(data['meta']['clinical']['diagnosis_confirm_type']) data_list.append(data['meta']['clinical']['melanocytic']) data_list.append(data['meta']['clinical']['sex']) data_list.append(data['meta']['unstructured']['diagnosis']) # In few json files, the race was not there so using KeyError exception to add '' at the place try: data_list.append(data['meta']['unstructured']['race']) except KeyError: data_list.append("") # will add an empty string in case race is not there. data_list.append(data['name']) return data_list def write_csv(): """Creates the desired csv file Parameters __________ list_of_files : file The list created by get_list_of_json_files() method result.csv : csv The csv file containing the header only Returns _______ result.csv : csv The desired csv file """ list_of_files = get_list_of_json_files() for file in list_of_files: row = create_list_from_json(f'descriptions/{file}') # create the row to be added to csv for each file (json-file) with open('output.csv', 'a') as c: writer = csv.writer(c) writer.writerow(row) c.close() if __name__ == '__main__': write_csv() |
我希望这将有所帮助。有关此代码如何工作的详细信息,请在此处查看
不幸的是,我对获得惊人的@Alec McGail答案贡献不大。
我使用的是Python3,我需要将地图转换为@Alexis R评论后的列表。
另外我发现csv编写器正在为文件添加一个额外的CR(对于csv文件中的数据,每行都有一个空行)。 @Jason R. Coombs对这个线程的回答之后,解决方案非常简单:
Python中的CSV添加额外的回车
您只需将lineterminator =' n'参数添加到csv.writer即可。它将是:
' )
令人惊讶的是,我发现到目前为止在此处发布的答案都没有正确处理所有可能的情况(例如,嵌套字典,嵌套列表,无值等)。
该解决方案应适用于所有情况:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | def flatten_json(json): def process_value(keys, value, flattened): if isinstance(value, dict): for key in value.keys(): process_value(keys + [key], value[key], flattened) elif isinstance(value, list): for idx, v in enumerate(value): process_value(keys + [str(idx)], v, flattened) else: flattened['__'.join(keys)] = value flattened = {} for key in json.keys(): process_value([key], json[key], flattened) return flattened |
您可以使用此代码将json文件转换为csv文件
在读取文件后,我将对象转换为pandas dataframe,然后将其保存为CSV文件
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | import os import pandas as pd import json import numpy as np data = [] os.chdir('D:\\Your_directory\\folder') with open('file_name.json', encoding="utf8") as data_file: for line in data_file: data.append(json.loads(line)) dataframe = pd.DataFrame(data) ## Saving the dataframe to a csv file dataframe.to_csv("filename.csv", encoding='utf-8',index= False) |
由于数据看起来是字典格式,因此看起来您应该实际使用csv.DictWriter()来实际输出具有相应标题信息的行。这应该允许稍微更容易地处理转换。然后,fieldnames参数将正确设置顺序,而第一行的输出作为标题将允许稍后由csv.DictReader()读取和处理。
例如,Mike Repass使用过
1 2 3 4 5 6 | output = csv.writer(sys.stdout) output.writerow(data[0].keys()) # header row for row in data: output.writerow(row.values()) |
但是只需将初始设置更改为
output = csv.DictWriter(filesetting,fieldnames = data [0] .keys())
请注意,由于未定义字典中元素的顺序,因此您可能必须显式创建字段名条目。一旦你这样做,作家将会工作。然后,写入将按照最初显示的方式工作。