Created this to fetch the link and name of an anchor tag. I use this when cleaning an HTML email to text. Using regex for HTML is not recommended but for this purpose I see no issue with it. This is not designed to work for nested anchors.
A note to keep in mind:
I was primarily concerned with valid HTML so if attributes do no use ' or " to contain the values then this will need to be tweaked.
If you can edit this to work better, please let me know.
<?php
function replaceAnchorsWithText($data) {
$regex = '/(<a\s*'; $regex .= '(.*?)\s*'; $regex .= 'href=[\'"]+?\s*(?P<link>\S+)\s*[\'"]+?'; $regex .= '\s*(.*?)\s*>\s*'; $regex .= '(?P<name>\S+)'; $regex .= '\s*<\/a>)/i'; if (is_array($data)) {
$data = "{$data['name']}({$data['link']})";
}
return preg_replace_callback($regex, 'replaceAnchorsWithText', $data);
}
$input = 'Test 1: <a href="http: //php.net1">PHP.NET1</a>.<br />';
$input .= 'Test 2: <A name="test" HREF=\'HTTP: //PHP.NET2\' target="_blank">PHP.NET2</A>.<BR />';
$input .= 'Test 3: <a hRef=http: //php.net3>php.net3</a><br />';
$input .= 'This last line had nothing to do with any of this';
echo replaceAnchorsWithText($input).'<hr/>';
?>
Will output:
Test 1: PHP.NET1(http: //php.net1).
Test 2: PHP.NET2(HTTP: //PHP.NET2).
Test 3: php.net3 (is still an anchor)
This last line had nothing to do with any of this