关于python:改变字符串列表的格式

Altering the format of a list of strings

我必须分析地震数据,在开始分析数据之前,我必须更改数据列出方式的格式。我必须更改以下格式:

1
2
3
4
14km WSW of Willow, Alaska$2.4
4km NNW of The Geysers, California$0.9
13km ESE of Coalinga, California$2.1
...

到:

1
2
["2.4, 14km WSW of Willow, Alaska","0.9, 4km NNW of The Geysers, California",
"2.1, 13km ESE of Coalinga, California", ...]

对于原始格式(省略URL),我拥有的代码是:

1
2
3
4
5
6
7
def fileToList(url):
    alist = []
    source = urllib2.urlopen(url)
    for line in source:
        items = line.strip()
        alist.append(items)
    return alist

我正在尝试创建变量Magnity和EarthQuakeloc来重新排列alist的格式,但我不知道从哪里开始。我对编码很陌生。任何建议都很好,谢谢。


似乎您只是在尝试重新排序每个字符串的格式化方式,因此,如果您在多行字符串中有这样的初始数据:

1
2
3
earthquake_data ="""14km WSW of Willow, Alaska$2.4
4km NNW of The Geysers, California$0.9
13km ESE of Coalinga, California$2.1"""

然后您可以在换行符上拆分它以获得字符串列表:

1
2
3
lines = data.split('
'
)
>>> ['14km WSW of Willow, Alaska$2.4', '4km NNW of The Geysers, California$0.9', '13km ESE of Coalinga, California$2.1']

对于数据列表中的每个项目,将其拆分到"$"符号上,这将为您留下如下列表:

1
2
split_lines = [l.split('$') for l in lines]
>>> [['14km WSW of Willow, Alaska', '2.4'], ['4km NNW of The Geysers, California', '0.9'], ['13km ESE of Coalinga, California', '2.1']]

然后,可以对列表理解中的每个项使用str.join()string方法将这些列表中的每个列表重新联接到字符串中:

1
2
reformatted_data = [",".join([l[1], l[0]]) for l in split_lines]
>>> ['2.4, 14km WSW of Willow, Alaska', '0.9, 4km NNW of The Geysers, California', '2.1, 13km ESE of Coalinga, California']

在这里,所有这些都包含在一个函数中:

1
2
3
4
5
6
7
8
9
def reformatStrings(data):
    lines = data.split("
"
)
    split_lines = [l.split('$') for l in lines]
    reformatted_data = [",".join([l[1], l[0]]) for l in split_lines]
    return reformatted_data


print(reformatStrings(earthquake_data))


假设您的source变量包含以下行:

1
2
3
14km WSW of Willow, Alaska$2.4
4km NNW of The Geysers, California$0.9
13km ESE of Coalinga, California$2.1

在最简单的情况下,使用str.splitstr.join功能就足够了:

1
2
3
4
5
6
7
def fileToList(url=''):
    source = urllib2.urlopen(url)

    return [', '.join(l.split('$')[::-1]) for l in source.split('
'
) if l.strip()]

print(fileToList())

输出如下:

1
['2.4, 14km WSW of Willow, Alaska', '0.9, 4km NNW of The Geysers, California', '2.1, 13km ESE of Coalinga, California']


提示:

1
2
3
4
5
6
7
8
9
>>> a ="14km WSW of Willow, Alaska$2.4"
>>> a = a.split("$")   split the string on `$`
>>> a
['14km WSW of Willow, Alaska', '2.4']
>>> a = a[::-1]        reverse the list    
>>> a
['2.4', '14km WSW of Willow, Alaska']
>>>",".join(a)            give jon on `,`
'2.4,14km WSW of Willow, Alaska'

一个衬里:

1
2
>>>",".join(a.split("$")[::-1])
'2.4,14km WSW of Willow, Alaska'

您预期输出的Python般的方式:

1
2
3
4
5
6
>>> myString ="""14km WSW of Willow, Alaska$2.4
... 4km NNW of The Geysers, California$0.9
... 13km ESE of Coalinga, California$2.1"""

>>> map(lambda x:",".join(x.split("$")[::-1]), myString.strip().split("
"
))
['2.4,14km WSW of Willow, Alaska', '0.9,4km NNW of The Geysers, California', '2.1,13km ESE of Coalinga, California']


如果您担心格式化,那么我将使用collections.namedtuple作为中间值:

1
2
3
4
5
6
7
8
9
10
11
12
from collections import namedtuple

Data = namedtuple('Data', ['position', 'magnitude'])

mystr ="""14km WSW of Willow, Alaska$2.4
4km NNW of The Geysers, California$0.9
13km ESE of Coalinga, California$2.1"""


list_of_data = []
for line in mystr.split('
'
):   # equivalent to your"for line in source"
    list_of_data.append(Data(*line.split('$')))

这将为您提供以下信息:

1
2
3
4
>>> list_of_data
[Data(position='14km WSW of Willow, Alaska', magnitude='2.4'),
 Data(position='4km NNW of The Geysers, California', magnitude='0.9'),
 Data(position='13km ESE of Coalinga, California', magnitude='2.1')]

很容易操作:

1
2
3
4
>>> ['{x.magnitude}, {x.position}'.format(x=x) for x in list_of_data]
['2.4, 14km WSW of Willow, Alaska',
 '0.9, 4km NNW of The Geysers, California',
 '2.1, 13km ESE of Coalinga, California']

或按大小排序:

1
2
3
4
>>> sorted(list_of_data, key=lambda x: x.magnitude)
[Data(position='4km NNW of The Geysers, California', magnitude='0.9'),
 Data(position='13km ESE of Coalinga, California', magnitude='2.1'),
 Data(position='14km WSW of Willow, Alaska', magnitude='2.4')

最后,如果数据集很大,使用regex可能更有意义。但是用str.split解析数据并将其保存在namedtuples中并不复杂,所以我使用了这种方法。