Xpath Query Won't Return Results
我正在尝试从Xpath查询返回一些结果,但不会正确选择元素。我正在使用以下代码:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 | public function getTrustPilotReviews($amount) { $trustPilotUrl = 'https://www.trustpilot.co.uk/review/purplegriffon.com'; $html5 = new HTML5; $document = $html5->loadHtml(file_get_contents($trustPilotUrl)); $document->validateOnParse = true; $xpath = new DOMXpath($document); $reviewsDomNodeList = $xpath->query('//div[@id="reviews-container"]//div[@itemprop="review"]'); $reviews = new Collection; foreach ($reviewsDomNodeList as $key => $reviewDomElement) { $xpath = new DOMXpath($reviewDomElement->ownerDocument); if ((int) $xpath->query('//*[@itemprop="ratingValue"]')->item($key)->getAttribute('content') >= 4) { $review = [ 'title' => 'Test', 'author' => $xpath->query('//*[@itemprop="author"]')->item($key)->nodeValue, 'date' => $xpath->query('//*[@class="ndate"]')->item($key)->nodeValue, 'rating' => $xpath->query('//*[@itemprop="ratingValue"]')->item($key)->nodeValue, 'body' => $xpath->query('//*[@itemprop="reviewBody"]')->item($key)->nodeValue, ]; $reviews->add((object) $review); } } return $reviews->take($amount); } |
此代码不会返回任何内容:
1 | //div[@id="reviews-container"]//div[@itemprop="review"] |
但是如果我将其更改为:
1 | //*[@id="reviews-container"]//*[@itemprop="review"] |
它部分起作用,但是不能返回正确的结果。
您似乎正在使用HTML5-PHP库。如果需要,则需要使用名称空间。该库将HTML5加载到XHTML文档中。您可以测试是否将DOM文档另存为XML。输出将类似于:
1 2 3 4 | <?xml version="1.0" encoding="UTF-8"?> <html xmlns="http://www.w3.org/1999/xhtml"> ... </html> |
因此,如果您使用XPath,则需要为XHTML名称空间注册并添加前缀,并将其用作元素名称。
1 2 3 4 5 6 7 8 9 10 11 | ... $xpath = new DOMXPath($document); $xpath->registerNamespace('x', 'http://www.w3.org/1999/xhtml'); $reviewNodes= $xpath->evaluate( '//x:div[@id="reviews-container"]//x:div[@itemprop="review"]' ); foreach ($reviewNodes as $reviewNode) { ... } ... |
循环中有一个条件,该条件可以是用于获取评论的外部XPath的一部分:
1 2 3 4 5 6 | $expression = '//x:div[@id="reviews-container"] //x:div[ @itemprop="review" and (.//*[@itemprop ="ratingValue"]/@content > 4) ]' |
不要使用
1 2 3 4 5 6 7 8 9 10 11 | ... foreach ($reviewNodes as $reviewNode) { $review = [ 'title' => 'Test', 'author'=> $xpath->evaluate('string(.//*@itemprop="author"])', $reviewNode), 'date'=> $xpath->evaluate('string(.//*[@class="ndate"])', $reviewNode), 'rating'=> $xpath->evaluate('string(.//*[@class="ratingValue"])', $reviewNode), 'body'=> $xpath->evaluate('string(.//*[@class="reviewBody"])', $reviewNode) ]; ... } |
由于在## php IRC中使用了Viper-7,biberu和salathe,我现在可以使用:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 | public function getTrustPilotReviews($amount) { $context = stream_context_create(array('ssl' => array('verify_peer' => false))); $url = 'https://www.trustpilot.co.uk/review/purplegriffon.com'; $data = file_get_contents($url, false, $context); libxml_use_internal_errors(true); $doc = new \\DOMDocument(); $doc->loadHTML($data); $xpath = new DOMXpath($doc); $reviews = new Collection; foreach($xpath->query('//div[@id="reviews-container"]/div[@itemprop="review"]') as $node) { $xpath = new DOMXpath($doc); $rating = $xpath->query('.//*[@itemprop="ratingValue"]', $node)->item(0)->getAttribute('content'); if ($rating >= 4) { $review = [ 'title' => $xpath->evaluate('normalize-space(descendant::*[@itemprop="headline"]/a)', $node), 'author' => $xpath->evaluate('normalize-space(descendant::*[@itemprop="author"])', $node), 'date' => $xpath->evaluate('normalize-space(descendant::*[@class="ndate"])', $node), 'rating' => $xpath->evaluate('number(descendant::*[@itemprop="ratingValue"]/@content)', $node), 'body' => $xpath->evaluate('normalize-space(descendant::*[@itemprop="reviewBody"])', $node), ]; $reviews->add((object) $review); } } return $reviews->take($amount); } |