How can I scroll a web page using selenium webdriver in python?
我目前正在使用Selenium Webdriver通过Facebook用户朋友页面进行解析,并从AJAX脚本中提取所有ID。 但是我需要向下滚动才能得到所有的朋友。 如何在Selenium中向下滚动。 我正在使用python。
您可以使用
1 | driver.execute_script("window.scrollTo(0, Y)") |
其中Y是高度(在全高清显示器上为1080)。 (感谢@lukeis)
您也可以使用
1 | driver.execute_script("window.scrollTo(0, document.body.scrollHeight);") |
滚动到页面底部。
如果要滚动到无限加载的页面,例如社交网络页面,facebook等(感谢@Cuong Tran)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | SCROLL_PAUSE_TIME = 0.5 # Get scroll height last_height = driver.execute_script("return document.body.scrollHeight") while True: # Scroll down to bottom driver.execute_script("window.scrollTo(0, document.body.scrollHeight);") # Wait to load page time.sleep(SCROLL_PAUSE_TIME) # Calculate new scroll height and compare with last scroll height new_height = driver.execute_script("return document.body.scrollHeight") if new_height == last_height: break last_height = new_height |
如果您想向下滚动到无限页面的底部(例如linkedin.com),则可以使用以下代码:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | SCROLL_PAUSE_TIME = 0.5 # Get scroll height last_height = driver.execute_script("return document.body.scrollHeight") while True: # Scroll down to bottom driver.execute_script("window.scrollTo(0, document.body.scrollHeight);") # Wait to load page time.sleep(SCROLL_PAUSE_TIME) # Calculate new scroll height and compare with last scroll height new_height = driver.execute_script("return document.body.scrollHeight") if new_height == last_height: break last_height = new_height |
参考:https://stackoverflow.com/a/28928684/1316860
如下所示的相同方法:
在python中,您可以使用
1 | driver.execute_script("window.scrollTo(0, Y)") |
(Y是您要滚动到的垂直位置)
1 2 3 | from selenium.webdriver.common.keys import Keys html = browser.find_element_by_tag_name('html') html.send_keys(Keys.END) |
经过测试,可以正常工作
1 2 3 | element=find_element_by_xpath("xpath of the li you are trying to access") element.location_once_scrolled_into_view |
当我尝试访问不可见的" li"时,这很有帮助。
这是您向下滚动网页的方式:
1 | driver.execute_script("window.scrollTo(0, 1000);") |
出于我的目的,我想向下滚动更多,同时牢记窗口的位置。我的解决方案很相似,并且使用了
1 | driver.execute_script("window.scrollTo(0, window.scrollY + 200)") |
它将转到当前的y滚动位置+ 200
使用youtube时,浮动元素的滚动高度为" 0"
因此,不要使用" return document.body.scrollHeight",而是尝试使用此" return document.documentElement.scrollHeight"
根据您的互联网速度调整滚动暂停时间
否则它将只运行一次,然后在此之后中断。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 | SCROLL_PAUSE_TIME = 1 # Get scroll height """last_height = driver.execute_script("return document.body.scrollHeight") this dowsnt work due to floating web elements on youtube """ last_height = driver.execute_script("return document.documentElement.scrollHeight") while True: # Scroll down to bottom driver.execute_script("window.scrollTo(0,document.documentElement.scrollHeight);") # Wait to load page time.sleep(SCROLL_PAUSE_TIME) # Calculate new scroll height and compare with last scroll height new_height = driver.execute_script("return document.documentElement.scrollHeight") if new_height == last_height: print("break") break last_height = new_height |
我发现解决该问题的最简单方法是选择一个标签,然后发送:
1 | label.sendKeys(Keys.PAGE_DOWN); |
希望它能起作用!
这些答案对我都不起作用,至少不是向下滚动Facebook搜索结果页面有效,但经过大量测试,我发现此解决方案:
1 2 3 4 5 6 7 8 | while driver.find_element_by_tag_name('div'): driver.execute_script("window.scrollTo(0, document.body.scrollHeight);") Divs=driver.find_element_by_tag_name('div').text if 'End of Results' in Divs: print 'end' break else: continue |
我正在寻找一种滚动浏览动态网页的方法,并在到达页面末尾并发现该线程时自动停止。
@Cuong Tran的帖子进行了一个主要修改,是我正在寻找的答案。我认为其他人可能会发现此修改很有用(它对代码的工作方式有明显影响),因此,本文发布了。
修改是移动捕获循环内最后一页高度的语句(以便使每项检查都与上一页高度进行比较)。
因此,下面的代码:
Continuously scrolls down a dynamic webpage (
.scrollTo() ), only stopping when, for one iteration, the page height stays the same.
(还有另一种修改,其中break语句位于另一个可以删除的条件内(以防页面"粘滞")。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 | SCROLL_PAUSE_TIME = 0.5 while True: # Get scroll height ### This is the difference. Moving this *inside* the loop ### means that it checks if scrollTo is still scrolling last_height = driver.execute_script("return document.body.scrollHeight") # Scroll down to bottom driver.execute_script("window.scrollTo(0, document.body.scrollHeight);") # Wait to load page time.sleep(SCROLL_PAUSE_TIME) # Calculate new scroll height and compare with last scroll height new_height = driver.execute_script("return document.body.scrollHeight") if new_height == last_height: # try again (can be removed) driver.execute_script("window.scrollTo(0, document.body.scrollHeight);") # Wait to load page time.sleep(SCROLL_PAUSE_TIME) # Calculate new scroll height and compare with last scroll height new_height = driver.execute_script("return document.body.scrollHeight") # check if the page height has remained the same if new_height == last_height: # if so, you are done break # if not, move on to the next loop else: last_height = new_height continue |
该代码滚动到底部,但不需要您每次都等待。它会不断滚动,然后在底部停止(或超时)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 | from selenium import webdriver import time driver = webdriver.Chrome(executable_path='chromedriver.exe') driver.get('https://example.com') pre_scroll_height = driver.execute_script('return document.body.scrollHeight;') run_time, max_run_time = 0, 1 while True: iteration_start = time.time() # Scroll webpage, the 100 allows for a more 'aggressive' scroll driver.execute_script('window.scrollTo(0, 100*document.body.scrollHeight);') post_scroll_height = driver.execute_script('return document.body.scrollHeight;') scrolled = post_scroll_height != pre_scroll_height timed_out = run_time >= max_run_time if scrolled: run_time = 0 pre_scroll_height = post_scroll_height elif not scrolled and not timed_out: run_time += time.time() - iteration_start elif not scrolled and timed_out: break # closing the driver is optional driver.close() |
这比每次等待0.5-3秒等待响应要快得多,而每次响应可能需要0.1秒
滚动加载页面。示例:中,定额等
1 2 3 4 5 6 7 8 9 10 11 12 13 | last_height = driver.execute_script("return document.body.scrollHeight") while True: driver.execute_script("window.scrollTo(0, document.body.scrollHeight-1000);") # Wait to load the page. driver.implicitly_wait(30) # seconds new_height = driver.execute_script("return document.body.scrollHeight") if new_height == last_height: break last_height = new_height # sleep for 30s driver.implicitly_wait(30) # seconds driver.quit() |