关于python：使用带有scrapy-splash的代理

using proxy with scrapy-splash

我正在尝试将proxy(proxymesh)与scrapy-splash一起使用。我有以下(相关)代码

1
2
3
4
5
6
7
8
9

PROXY ="""splash:on_request(function(request)
request:set_proxy{
host = http://us-ny.proxymesh.com,
port = 31280,
username = username,
password = secretpass,
}
return splash:html()
end)"""

并在start_requests中

1
2
3
4
5
6
7
8

def start_requests(self):
for url in self.start_urls:
print url
yield SplashRequest(url, self.parse,
endpoint='execute',
args={'wait': 5,
'lua_source': PROXY,
'js_source': 'document.body'},

但这似乎不起作用。根本不调用self.parse。如果将终结点更改为" render.html"，则会打出self.parse方法，但是当我检查标头(response.headers)时，我可以看到它没有通过代理。我确认，当我将http://checkip.dyndns.org/设置为起始URL并在解析响应时看到了我的旧IP地址。

我究竟做错了什么？

您应该在SplashRequest对象中添加'proxy'参数。

1
2
3
4
5
6
7
8
9

def start_requests(self):
for url in self.start_urls:
print url
yield SplashRequest(url, self.parse,
endpoint='execute',
args={'wait': 5,
'lua_source': PROXY,
'js_source': 'document.body',
'proxy': 'http://proxy_ip:proxy_port'}