Voting

The Note You're Voting On

3 years ago
After struggling with parsing and modifying partial HTML content for several hours, I came to this solution which does work for me and is relatively simple compared to what else I found online.

This solution fixes unwanted DOCTYPE and html, body tags as well as encoding issues.

<?php

// Assumption: content is utf-8 encoded
$content = "<h1>This is a heading</h1><p>This is a paragraph</p>";

// Load content to a div and specify encoding with a meta tag
$temp_dom = new DOMDocument();
$temp_dom->loadHTML("<meta http-equiv='Content-Type' content='charset=utf-8' /><div>$content</div>");

// As loadHTML() adds a DOCTYPE as well as <html> and <body> tag, let’s create another DOMDocument and import just the nodes we want
$dom = new DOMDocument();
$first_div = $temp_dom->getElementsByTagName('div')[0];
$first_div_node = $dom->importNode($first_div, true);
$dom->appendChild($first_div_node);

// Do whatever you want to do
$dom->getElementsByTagName('h1')[0]->setAttribute('class', 'happy');

// You could also just echo $dom->saveHtml() if you don’t mind the div and whitespace 
echo substr(trim($dom->saveHtml()), 5, -6);

// Outputs: <h1 class="happy">This is a heading</h1><p>This is a paragraph</p>
?>
<< Back to user notes page