remove duplicate from list and check if IP's from one list in another list
我必须要csv文件。
第一个看起来像这样:
第二个包含IP列表:
1 2 | 139.15.250.196 139.15.5.176 |
我想检查第一个文件中是否有给定的IP。 这可以正常工作(如果我的代码损坏,请更正或提供提示),但问题是第一个文件包含许多重复值,例如 10.0.0.1可能会出现x次,而我找不到删除重复项的方法。 你能帮我还是指导一下?
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | import csv filename = 'ip2.csv' with open(filename) as f: reader = csv.reader(f) ip = [] for row in reader: ip.append(row[0]) filename = 'bonk_https.csv' with open(filename) as f: reader = csv.reader(f) ip_ext = [] for row in reader: ip_ext.append(row[0]) for a in ip: if a in ip_ext: print(a) |
您可以使用
1 2 3 4 5 6 7 | with open(filename) as f: ip_ext = [] for row in reader: ip_ext.append(row[0]) for a in set(ip): if a in set(ip_ext): #well, you don't need a set her unless you also have duplicates in ip_ext print(a) |
或者,如果找到条目,则中断/继续。 这可能会帮助您
我建议您标准化所有IP,
1 2 3 4 | with open(...) as f # a set comprehension of _normalized_ ips, this strips excess trailing zeros my_ips = {'.'.join('%d'%int(n) for n in t) for t in [x.split(',')[0].split('.') for x in f]} |
接下来,您要对照归一化集合中包含的IP来检查第二个文件中的每个归一化IP(请注意,与其他答案不同,这里您有一个循环,并检查项目是否是集合的成员,< x2>,是高度优化的操作)
1 2 3 4 5 6 7 | with open(...) as f: for line in f: ip = '.'.join('%d'%int(n) for n in line.split('.')) if ip in my_ips: ... else: ... |