在Python中读取Excel文件

Read Excel File in Python

我有一个Excel文件

1
2
3
4

Arm_id DSPName DSPCode HubCode PinCode PPTL
1 JaVAS 01 AGR 282001 1,2
2 JaVAS 01 AGR 282002 3,4
3 JaVAS 01 AGR 282003 5,6

我想以Arm_id,DSPCode,Pincode的形式保存一个字符串。此格式是可配置的，即它可能更改为DSPCode,Arm_id,Pincode。我把它的格式保存在一个列表中

1	FORMAT = ['Arm_id', 'DSPName', 'Pincode']

如果FORMAT是可配置的，那么如何读取具有所提供名称的特定列的内容呢？

这就是我所尝试的。目前我可以读取文件中的所有内容

1
2
3
4
5
6
7
8
9
10
11
12
13
14

from xlrd import open_workbook
wb = open_workbook('sample.xls')
for s in wb.sheets():
#print 'Sheet:',s.name
values = []
for row in range(s.nrows):
col_value = []
for col in range(s.ncols):
value = (s.cell(row,col).value)
try : value = str(int(value))
except : pass
col_value.append(value)
values.append(col_value)
print values

我的输出是

1	[[u'Arm_id', u'DSPName', u'DSPCode', u'HubCode', u'PinCode', u'PPTL'], ['1', u'JaVAS', '1', u'AGR', '282001', u'1,2'], ['2', u'JaVAS', '1', u'AGR', '282002', u'3,4'], ['3', u'JaVAS', '1', u'AGR', '282003', u'5,6']]

然后我绕着values[0]循环，试图找出values[0]中的FORMAT内容，然后得到values[0]中Arm_id, DSPname and Pincode的索引，然后从下一个循环中，我知道所有FORMAT因素的索引，从而知道我需要得到哪个值。

但这是一个糟糕的解决方案。

如何在Excel文件中获取具有名称的特定列的值？

相关讨论

回答有点晚，但使用pandas可以直接获取Excel文件的列：

1
2
3
4
5
6
7
8
9
10

import pandas
import xlrd
df = pandas.read_excel('sample.xls')
#print the column names
print df.columns
#get the values for a given column
values = df['Arm_id'].values
#get a data frame with selected columns
FORMAT = ['Arm_id', 'DSPName', 'Pincode']
df_selected = df[FORMAT]

相关讨论

这是一种方法：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53

from xlrd import open_workbook

class Arm(object):
def __init__(self, id, dsp_name, dsp_code, hub_code, pin_code, pptl):
self.id = id
self.dsp_name = dsp_name
self.dsp_code = dsp_code
self.hub_code = hub_code
self.pin_code = pin_code
self.pptl = pptl

def __str__(self):
return("Arm object:
"
" Arm_id = {0}
"
" DSPName = {1}
"
" DSPCode = {2}
"
" HubCode = {3}
"
" PinCode = {4}
"
" PPTL = {5}"
.format(self.id, self.dsp_name, self.dsp_code,
self.hub_code, self.pin_code, self.pptl))

wb = open_workbook('sample.xls')
for sheet in wb.sheets():
number_of_rows = sheet.nrows
number_of_columns = sheet.ncols

items = []

rows = []
for row in range(1, number_of_rows):
values = []
for col in range(number_of_columns):
value = (sheet.cell(row,col).value)
try:
value = str(int(value))
except ValueError:
pass
finally:
values.append(value)
item = Arm(*values)
items.append(item)

for item in items:
print item
print("Accessing one single value (eg. DSPName): {0}".format(item.dsp_name))
print

您不必使用自定义类，只需取一个dict()。但是，如果使用类，则可以通过点标记访问所有值，如上面所示。

下面是上面脚本的输出：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26

Arm object:
Arm_id = 1
DSPName = JaVAS
DSPCode = 1
HubCode = AGR
PinCode = 282001
PPTL = 1
Accessing one single value (eg. DSPName): JaVAS

Arm object:
Arm_id = 2
DSPName = JaVAS
DSPCode = 1
HubCode = AGR
PinCode = 282002
PPTL = 3
Accessing one single value (eg. DSPName): JaVAS

Arm object:
Arm_id = 3
DSPName = JaVAS
DSPCode = 1
HubCode = AGR
PinCode = 282003
PPTL = 5
Accessing one single value (eg. DSPName): JaVAS

因此，关键部分是抓取头(col_names = s.row(0))并在遍历行时跳过第一行，这不需要for row in range(1, s.nrows)—使用从1开始的范围(而不是隐式0)来完成。然后使用zip单步执行将"name"作为列标题的行。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16

from xlrd import open_workbook

wb = open_workbook('Book2.xls')
values = []
for s in wb.sheets():
#print 'Sheet:',s.name
for row in range(1, s.nrows):
col_names = s.row(0)
col_value = []
for name, col in zip(col_names, range(s.ncols)):
value = (s.cell(row,col).value)
try : value = str(int(value))
except : pass
col_value.append((name.value, value))
values.append(col_value)
print values

相关讨论

通过使用熊猫，我们可以轻松阅读Excel。

1
2
3
4
5
6
7
8
9

import pandas as pd
import xlrd as xl
from pandas import ExcelWriter
from pandas import ExcelFile

DataF=pd.read_excel("Test.xlsx",sheet_name='Sheet1')

print("Column headings:")
print(DataF.columns)

测试地点：https://repl.it参考：https://pythonspot.com/read-excel-with-pandas/

相关讨论

我采用的方法从第一行读取头信息来确定感兴趣的列的索引。

您在问题中提到您还希望将值输出到字符串。我动态地从格式列列表为输出构建一个格式字符串。行追加到值字符串中，由新行字符分隔。

输出列顺序由格式列表中列名称的顺序决定。

在我下面的代码中，列名称在格式列表中的情况很重要。在上面的问题中，您的格式列表中有"pincode"，而Excel中有"pincode"。这在下面不起作用，它需要被"夹击"。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18

from xlrd import open_workbook
wb = open_workbook('sample.xls')

FORMAT = ['Arm_id', 'DSPName', 'PinCode']
values =""

for s in wb.sheets():
headerRow = s.row(0)
columnIndex = [x for y in FORMAT for x in range(len(headerRow)) if y == firstRow[x].value]
formatString = ("%s,"*len(columnIndex))[0:-1] +"
"

for row in range(1,s.nrows):
currentRow = s.row(row)
currentRowValues = [currentRow[x].value for x in columnIndex]
values += formatString % tuple(currentRowValues)

print values

对于上面给出的示例输入，此代码输出：

1
2
3

>>> 1.0,JaVAS,282001.0
2.0,JaVAS,282002.0
3.0,JaVAS,282003.0

因为我是一个python noob，道具是：这个答案，这个答案，这个问题，这个问题还有这个答案。

相关讨论

尽管我几乎总是使用pandas来完成这项工作，但我目前的小工具正在打包成一个可执行文件，包括pandas，这太过分了。所以我创建了一个poida解决方案的版本，它产生了一个命名元组的列表。他的代码与此更改类似：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20

from xlrd import open_workbook
from collections import namedtuple
from pprint import pprint

wb = open_workbook('sample.xls')

FORMAT = ['Arm_id', 'DSPName', 'PinCode']
OneRow = namedtuple('OneRow', ' '.join(FORMAT))
all_rows = []

for s in wb.sheets():
headerRow = s.row(0)
columnIndex = [x for y in FORMAT for x in range(len(headerRow)) if y == headerRow[x].value]

for row in range(1,s.nrows):
currentRow = s.row(row)
currentRowValues = [currentRow[x].value for x in columnIndex]
all_rows.append(OneRow(*currentRowValues))

pprint(all_rows)