PHP解析HTML标记 | 码农家园

PHP parse HTML tags

本问题已经有最佳答案，请猛点这里访问。

Possible Duplicate:
How to parse and process HTML with PHP?

我对PHP很陌生。我有一个字符串变量中某个页面的body标记的文本。我想知道它是否包含一些标签…如果给定了标记名tag1，则只从字符串中获取该标记。我怎样才能在PHP中简单地做到这一点？

谢谢！！

你会看到这样的东西：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19

<?php
$content ="";
$doc = new DOMDocument();
$doc->load("example.html");
$items = $doc->getElementsByTagName('tag1');
if(count($items) > 0) //Only if tag1 items are found
{
foreach ($items as $tag1)
{
// Do something with $tag1->nodeValue and save your modifications
$content .= $tag1->nodeValue;
}
}
else
{
$content = $doc->saveHTML();
}
echo $content;
?>

domdocument表示整个HTML或XML文档；用作文档树的根。因此，您将拥有一个有效的标记，并且通过按标记名查找元素，您将找不到注释。

相关讨论

另一种可能性是regex。

1
2
3
4

$matches = null;
$returnValue = preg_match_all('#<li.*?>(.*?)
</li>
#', 'abc', $matches);

$matches[0][x]包含整个匹配项，如

list entry

、$matches[1][x]只包含内部HTML，如list entry。

相关讨论

快速方式：

查找tag1的索引位置，然后查找/tag1的索引位置。然后在这两个索引之间剪切字符串。在php.net上查找strpos和substr另外，如果字符串太长，这可能不起作用。

1
2
3

$pos1 = strpos($bigString, '<tag1>');
$pos2 = strpos($bigString, '</tag1>');
$resultingString = substr($bigString, -$pos1, $pos2);

您可能需要添加和/或从$pos1和$pos2中减去一些单位，以使$resultingstring正确。(如果你没有关于tag1的评论，请叹息)

正确的方法：

查找HTML分析器

相关讨论