XPATH partial match tr id with Python, Selenium,
我也可以使用正确的XPATH来提取tr id = " review_ "元素吗?
我设法获取了元素,但是很幸运地发现了ID,因为它们是部分匹配
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | <table class="admin"> <thead>"snip"</thead> <tbody> <tr id="review_984669" class=""> <td>weird_wild_and_wonderful_mammals</td> <td>1</td> <td><input type="checkbox" name="book_review[approved]" id="approved" value="1" class="attribute_toggle"></td> <td><input type="checkbox" name="book_review[rejected]" id="rejected" value="1" class="attribute_toggle"></td> <td>February 27, 2019 03:56</td> <td>Show</td> <td> <span class="rest-in-place" data-attribute="review" data-object="book_review" data-url="/admin/new_book_reviews/984669"> bad </span> </td> </tr> <tr id="review_984670" class="striped"> |
我将Selenium与Chrome一起使用来提取页面上的唯一表格。
1 | Table_Selenium_Elements = driver.find_element_by_xpath('//*[@id="admin"]/table') |
然后我使用下面的方法从每一行获取数据。
1 2 3 4 5 6 7 8 9 10 | for Pri_Key, element in enumerate(Table_Selenium_Elements.find_elements_by_xpath('.//tr')): # Create an empty secondary dict for each new Pri Key sec = {} # Secondary dictionary needs a Key. Keys are items in column_headers list for counter, Sec_Key in enumerate(column_headers): # Secondary dictionary needs Values for each key. # Values are individual items in each sub-list of column_data list # Slice the sub list with the counter to get each item sec[Sec_Key] = element.get_attribute('innerHTML')[counter] pri[Pri_Key] = sec |
这仅显示每个中的数据,即
" weird_wild_and_wonderful_mmmmals "," 1 "
但是我实际上也需要tr id = review_xxx。我不知道该怎么做。
ID号会发生变化,因此可能是xpath \\'contains \\'表达式或xpath \\'begins_with \\'表达式。
由于我是菜鸟,所以我想我已经捕获了review_ID,但未通过for循环正确提取。
有人可以告诉我正确的XPATH来提取父tr和子tds。
...然后我将调整for循环。
谢谢
山姆
1 | driver.find_element_by_class_name('striped') |
或
1 2 | # If it is the last row in the table. driver.find_elements_by_css_selector('tbody tr')[-1] |
或
1 2 | # If it is surely the 2nd row in the table. driver.find_elements_by_css_selector('tbody tr')[1] |
基于具有以下选择器示例的html,您可以获取所有行:
1 2 3 | admin_table_rows = driver.find_elements_by_css_selector(".admin tbody > tr") admin_table_rows = driver.find_elements_by_css_selector(".admin tr[id^='review_']") admin_table_rows = driver.find_elements_by_xpath("//table[@class='admin']//tr[starts-with(@id,'review_')]") |
要获取
这里是如何抓取数据的示例:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | from selenium import webdriver from selenium.webdriver.common.by import By from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.support import expected_conditions as EC wait = WebDriverWait(driver, 10) admin_table_rows = wait.until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR,".admin tr[id^='review_']"))) for row in admin_table_rows: row_id = row.get_attribute("id").replace("review_","") label = row.find_element_by_css_selector("td:nth-child(1)") num = row.find_element_by_css_selector("td:nth-child(2)") date = row.find_element_by_css_selector("td:nth-child(3)") href = row.find_element_by_css_selector("a").get_attribute("href") |
您只是要一个xPath来定位表元素本身吗?
在您的示例中,您有一个xPath查找您拥有的表
1 | [@id="admin"] |
\\'admin \\'是类,而不是ID。如果您仅将其切换为
,它是否有效?
1 | Table_Selenium_Elements = driver.find_element_by_xpath('//*[@id="admin"]/table') |