关于正则表达式：将纯文本URL转换为PHP中的HTML超链接

Convert plain text URLs into HTML hyperlinks in PHP

我有一个简单的评论系统，人们可以在纯文本字段内提交超链接。当我将这些记录从数据库显示回到网页中时，可以使用PHP中的哪些RegExp将这些链接转换为HTML型锚链接？

我不希望算法通过任何其他类型的链接(仅http和https)来执行此操作。

这是另一种解决方案，它将捕获所有http / https / www并转换为可单击的链接。

1
2
3

$url = '~(?:(https?)://([^\s<]+)|(www\.[^\s<]+?\.[^\s<]+))(?<![\.,:])~i';
$string = preg_replace($url, '$0', $string);
echo $string;

或者，仅捕获http / https，然后使用下面的代码。

1
2
3

$url = '/(http|https|ftp|ftps)\:\/\/[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(\/\S*)?/';
$string= preg_replace($url, '$0', $string);
echo $string;

编辑：
下面的脚本将捕获所有url类型，并将它们转换为可点击的链接。

1
2
3

$url = '@(http)?(s)?(://)?(([a-zA-Z])([-\w]+\.)+([^\s\.]+[^\s]*)+[^,.\s])@';
$string = preg_replace($url, '$0', $string);
echo $string;

新更新，如果您要去除字符串(s)，请使用下面的代码块，感谢@AndrewEllis指出了这一点。

1
2
3

$url = '@(http(s)?)?(://)?(([a-zA-Z])([-\w]+\.)+([^\s\.]+[^\s]*)+[^,.\s])@';
$string = preg_replace($url, '$0', $string);
echo $string;

这是URL无法正确显示的非常简单的解决方案。

1
2
3

$email = '[email protected]';
$string = $email;
echo $string;

这是一个非常简单的修复程序，但是您必须根据自己的目的对其进行修改。

相关讨论

编辑后的版本会用www捕获地址(例如：www.gotranscript.com)。 MkVal答案-不。所以这里+1。
您发布的第一个代码块将尝试将www.gmail.com转换为锚，但是失败，因为href ="前面没有" http：//"
是的，这就是为什么我添加了其他2种选择，第3个代码块将捕获所有url类型。
正确的正则表达式应为$url = @(http(s)?)?(:)?(([a-zA-Z])([-\w]+\.)+([^\s\.]+[^\s]*)+[^,.\s])@;。注意第一个(s)?的位置。您最初的使用方式将其转换为" uperaswesome.com"，从而打破了" superawesome.com"之类的域名。
@AndrewEllis，您好：感谢您的输入，但我已经对此进行了广泛的测试，并且我没有让它删除与第3个代码块一起使用的Im上的任何URL，如果有任何问题，那么他们可以使用您添加的版本，因此，谢谢您的建议。
@RuddernationDesigns请看一下此输出。您会注意到href=""值切断了" s"：gist.github.com/ellisio/1c6da4c8b3fa401d9ac66da9a3a20ce2
嗨，@ AndrewEllis，我现在看到了，尽管由于某种原因我没有在即时消息中使用它，但是我会根据您的建议添加新的代码块，尽管如此，再次感谢您。
@RuddernationDesigns尝试您的第三个和最后更新的答案。两个单词之间加一个点将构成一个链接。示例Thanks.okay将转到链接谢谢。
@ c.k，是的，我已经注意到，我可能会在短时间内想出一个解决方案，但是感谢您指出这一点。 :)
@RuddernationDesigns：类似于两个带点的单词，我得到了一个奇怪的结果，该文本为"请不要响应。
"，而正则表达式选择了"响应。
@xtempore Ive注意到了，我将在几天后考虑对其进行更改。 :)
@RuddernationDesigns仅供参考：[email protected]转换为[email protected](即，链接中不包括" email @")。
@BillelHacaine感谢您指出这一点，这很奇怪，但我从未对此进行过测试，是所有脚本还是某些脚本？
@RuddernationDesigns第1和第2脚本根本无法检测到该电子邮件。我之前的评论是关于第三个脚本的，但是最后一个脚本检测到了它，并返回了一个奇怪的html代码。您可以使用以下方法进行简单的测试：$string ="some text [email protected] some text"
@BillelHacaine谢谢，我将检查一下并稍后更新脚本。
@RuddernationDesigns，请只发布1个答案，而不是集思广益自己哪个是正确的。谢谢
@ T.Todua有多个脚本，因为某些PHP文件需要不同的脚本，有些服务器的设置也不同，另外，每个服务器都有不同的要求，有些只需要HTTP / S，有些想要WWW，有些想要FTP / S，每个都可以工作关于如何设置用户自己的脚本，我在每个脚本中提供了一些文本，说明了他们的工作方式。
@RuddernationDesigns和所有这些差异可以放在一个正则表达式中，例如(ftp|http(s|)...)
@RuddernationDesigns吗？首先请确保我是否这样做，然后责怪我投票。我没有那样做。
一个问题如果我的字符串是click here (http:myweb.comfile.txt)，则解决方案将检索http:myweb.comfile.txt)作为URL。任何想法如何不包括最后一个)？
@ Si8所以要剥离()吗？您可以使用类似以下的命令： $ string =" Remove *()All Icons << >> a?！@＃只显示文本st＃$％ring。 $ res = preg_replace(" / [^ a-zA-Z] /"，"，$ string); echo $ res;这将除去文本之外的所有内容，我认为那是您的意思，如果可以，我希望它对您有用，否则请再次在这里与我联系或给我发送电子邮件，我的联系方式在我的个人资料中。
@ Si8您还可以使用类似以下命令的方法：$ res = preg_replace(" / [^ a-zA-Z0-9 s] /"，"，$ string);如果需要，它将去除所有讨厌的字符和空白空白区域，然后删除"问候"，" Rudder"
我"入侵"了它及其工作原理，但感谢您的建议。我会尝试一下，也许这样会更快。
那很好，我希望它能起作用，祝你好运:)

好吧，Volomike的答案要近得多。为了进一步推动它，这是我所做的，它忽略了超链接末尾的结尾时间。我还考虑了URI片段。

1
2
3

public static function makeClickableLinks($s) {
return preg_replace('@(https?://([-\w\.]+[-\w])+(:\d+)?(/([\w/_\.#-]*(\?\S+)?[^\.\s])?)?)@', '$1', $s);
}

相关讨论

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20

<?
function makeClickableLinks($text)
{

$text = html_entity_decode($text);
$text ="".$text;
$text = eregi_replace('(((f|ht){1}tp://)[-a-zA-Z0-9@:%_\+.~#?&//=]+)',
'\\1', $text);
$text = eregi_replace('(((f|ht){1}tps://)[-a-zA-Z0-9@:%_\+.~#?&//=]+)',
'\\1', $text);
$text = eregi_replace('([[:space:]()[{}])(www.[-a-zA-Z0-9@:%_\+.~#?&//=]+)',
'\\1\\2', $text);
$text = eregi_replace('([_\.0-9a-z-]+@([0-9a-z][0-9a-z-]+\.)+[a-z]{2,3})',
'\\1', $text);
return $text;
}

// Example Usage
echo makeClickableLinks("This is a test clickable link: http://www.websewak.com You can also try using an email address like [email protected]");
?>

相关讨论

请参阅http://zenverse.net/php-function-to-auto-convert-url-into-hyperlink/。
这就是WordPress的解决方案

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46

function _make_url_clickable_cb($matches) {
$ret = '';
$url = $matches[2];

if ( empty($url) )
return $matches[0];
// removed trailing [.,;:] from URL
if ( in_array(substr($url, -1), array('.', ',', ';', ':')) === true ) {
$ret = substr($url, -1);
$url = substr($url, 0, strlen($url)-1);
}
return $matches[1] ."$url" . $ret;
}

function _make_web_ftp_clickable_cb($matches) {
$ret = '';
$dest = $matches[2];
$dest = 'http://' . $dest;

if ( empty($dest) )
return $matches[0];
// removed trailing [,;:] from URL
if ( in_array(substr($dest, -1), array('.', ',', ';', ':')) === true ) {
$ret = substr($dest, -1);
$dest = substr($dest, 0, strlen($dest)-1);
}
return $matches[1] ."$dest" . $ret;
}

function _make_email_clickable_cb($matches) {
$email = $matches[2] . '@' . $matches[3];
return $matches[1] ."$email";
}

function make_clickable($ret) {
$ret = ' ' . $ret;
// in testing, using arrays here was found to be faster
$ret = preg_replace_callback('#([\s>])([\w]+?://[\w\\x80-\\xff\#$%&~/.\-;:=,?@\[\]+]*)#is', '_make_url_clickable_cb', $ret);
$ret = preg_replace_callback('#([\s>])((www|ftp)\.[\w\\x80-\\xff\#$%&~/.\-;:=,?@\[\]+]*)#is', '_make_web_ftp_clickable_cb', $ret);
$ret = preg_replace_callback('#([\s>])([.0-9a-z_+-]+)@(([0-9a-z-]+\.)+[0-9a-z]{2,})#i', '_make_email_clickable_cb', $ret);

// this one is not in an array because we need it to run last, for cleanup of accidental links within links
$ret = preg_replace("#(]+?>|>))]+?>([^>]+?)#i","$1$3", $ret);
$ret = trim($ret);
return $ret;
}

评分最高的答案对我没有帮助，以下链接未正确替换：

http://www.fifa.com/worldcup/matches/round255951/match=300186487/index.html#nosticky

经过一些谷歌搜索和一些测试，这是我想出的：

1
2
3

public static function replaceLinks($s) {
return preg_replace('@(https?://([-\w\.]+)+(:\d+)?(/([\w/_\.%-=#]*(\?\S+)?)?)?)@', '$1', $s);
}

我不是正则表达式的专家，实际上这让我很困惑:)

因此，请随时发表评论并改进此解决方案。

这是我的代码，用于格式化文本内的所有链接，包括带有协议和不带有协议的电子邮件，URL。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16

public function formatLinksInText($text)
{
//Catch all links with protocol
$reg = '/(http|https|ftp|ftps)\:\/\/[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,}(\/\S*)?/';
$formatText = preg_replace($reg, '$0', $formatText);

//Catch all links without protocol
$reg2 = '/(?<=\s|\A)([0-9a-zA-Z\-\.]+\.[a-zA-Z0-9\/]{2,})(?=\s|$|\,|\.)/';
$formatText = preg_replace($reg2, '$0', $formatText);

//Catch all emails
$emailRegex = '/(\S+\@\S+\.\S+)/';
$formatText = preg_replace($emailRegex, '$1', $formatText);
$formatText = nl2br($formatText);
return $formatText;
}

请评论无效的网址。我将尝试更新正则表达式。

来自MkVal的答案有效，但在我们已经有了锚链接的情况下，它将以奇怪的格式呈现文本。

这是在两种情况下都适用的解决方案：

1
2
3
4
5

$s = preg_replace (
"/(?<!a href=")(?<!src=")((http|ftp)+(s)?:\/\/[^<>\s]+)/i",
"\\0",
$s
);

相关讨论

1
2
3

public static function makeClickableLinks($s) {
return preg_replace('@(https?://([-\w\.]+)+(:\d+)?(/([\w/_\.-]*(\?\S+)?)?)?)@', '$1', $s);
}

相关讨论

我使用的是源自Question2answer的函数，它接受html中的纯文本甚至纯文本链接：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65

// $html holds the string
$htmlunlinkeds = array_reverse(preg_split('|<[Aa]\s+[^>]+>.*</[Aa]\s*>|', $html, -1, PREG_SPLIT_OFFSET_CAPTURE)); // start from end so we substitute correctly
foreach ($htmlunlinkeds as $htmlunlinked)
{ // and that we don't detect links inside HTML, e.g. <img src="http://...">
$thishtmluntaggeds = array_reverse(preg_split('/<[^>]*>/', $htmlunlinked[0], -1, PREG_SPLIT_OFFSET_CAPTURE)); // again, start from end
foreach ($thishtmluntaggeds as $thishtmluntagged)
{
$innerhtml = $thishtmluntagged[0];
if(is_numeric(strpos($innerhtml, '://')))
{ // quick test first
$newhtml = qa_html_convert_urls($innerhtml, qa_opt('links_in_new_window'));
$html = substr_replace($html, $newhtml, $htmlunlinked[1]+$thishtmluntagged[1], strlen($innerhtml));
}
}
}
echo $html;

function qa_html_convert_urls($html, $newwindow = false)
/*
Return $html with any URLs converted into links (with nofollow and in a new window if $newwindow).
Closing parentheses/brackets are removed from the link if they don't have a matching opening one. This avoids creating
incorrect URLs from (http://www.question2answer.org) but allow URLs such as http://www.wikipedia.org/Computers_(Software)
*/
{
$uc = 'a-z\x{00a1}-\x{ffff}';
$url_regex = '#\b((?:https?|ftp)://(?:[0-9'.$uc.'][0-9'.$uc.'-]*\.)+['.$uc.']{2,}(?::\d{2,5})?(?:/(?:[^\s<>]*[^\s<>\.])?)?)#iu';

// get matches and their positions
if (preg_match_all($url_regex, $html, $matches, PREG_OFFSET_CAPTURE)) {
$brackets = array(
')' => '(',
'}' => '{',
']' => '[',
);

// loop backwards so we substitute correctly
for ($i = count($matches[1])-1; $i >= 0; $i--) {
$match = $matches[1][$i];
$text_url = $match[0];
$removed = '';
$lastch = substr($text_url, -1);

// exclude bracket from link if no matching bracket
while (array_key_exists($lastch, $brackets)) {
$open_char = $brackets[$lastch];
$num_open = substr_count($text_url, $open_char);
$num_close = substr_count($text_url, $lastch);

if ($num_close == $num_open + 1) {
$text_url = substr($text_url, 0, -1);
$removed = $lastch . $removed;
$lastch = substr($text_url, -1);
}
else
break;
}

$target = $newwindow ? ' target="_blank"' : '';
$replace = '' . $text_url . '' . $removed;
$html = substr_replace($html, $replace, $match[1], strlen($match[0]));
}
}

return $html;
}

由于接受了包含方括号和其他字符的链接，因此代码有些过多，但可能会有所帮助。

我建议不要像这样在飞行中做很多事情。我更喜欢使用简单的编辑器界面，例如stackoverflow中使用的界面。它称为Markdown。

相关讨论

1
2
3
4
5
6
7
8
9
10
11
12
13
14

$string = 'example.com
www.example.com
http://example.com
https://example.com
http://www.example.com
https://www.example.com';

preg_match_all('#(\w*://|www\.)[a-z0-9]+(-+[a-z0-9]+)*(\.[a-z0-9]+(-+[a-z0-9]+)*)+(/([^\s()<>;]+\w)?/?)?#i', $string, $matches, PREG_OFFSET_CAPTURE | PREG_SET_ORDER);
foreach (array_reverse($matches) as $match) {
$a = '' . $match[0][0] . '';
$string = substr_replace($string, $a, $match[0][1], strlen($match[0][0]));
}

echo $string;

结果：

1
2
3
4
5
6

example.com
www.example.com
http://example.com
https://example.com
http://www.example.com
https://www.example.com

我在此解决方案中喜欢的是，它还将www.example.com转换为http://www.example.com，因为不起作用(没有http/https协议，它指向yourdomain.com/www.example.com)。

试试这个：

1	$s = preg_replace('/(?<!href="\|">)(?<!src=")((http\|ftp)+(s)?:\/\/[^<>\s]+)/is', '\\1', $s);

它跳过现有的链接(如果我们已经有了href，则不会在href内添加href)。否则，它将添加带有空白目标的a href。

如果正确，您要做的是将普通文本转换为http链接。我认为这可以帮助您：

1
2
3
4
5
6
7
8

<?php

$list = mysqli_query($con,"SELECT * FROM list WHERE name = 'table content'");
while($row2 = mysqli_fetch_array($list)) {
echo"" . $row2['content']."";

}
?>