Skip to content

Fix #79700: Bad performance with namespaced nodes due to wrong libxml assumption #5719

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

beberlei
Copy link
Contributor

ext/dom was wrongly using xmlDoc->oldNs as a linked list of all namespaces, without deduplicating already existing namespaces in the document. This could lead to the oldNs linked list becoming extremely large and performance degrading exponentially in the number of nodes with namespaces.

The bug https://bugs.php.net/bug.php?id=79700 has a test script showing the performance loss.

libxml2 uses oldNs only to store the xml namespace property on, which is an implicit namespace that does not need to be declared on a node. It is only ever used when searching for the "xml" prefix or the "http://www.w3.org/XML/1998/namespace" namespace. When its being used, its automatically created by libxml and does not need to be created by ext/dom.

Removing the function dom_set_old_ns does not break any tests, including the newly created using the xml namespace. Performance of the test script gets each batch of 10000 elements roughly the same append performance.

Merge is suggested for master only, as i am a bit unsure about potential side effects that I can't see with oldNs, even after digging into libxml code for hours, I am not 100% sure about this.

/cc @cmb69 @ThomasWeinert

@cmb69
Copy link
Member

cmb69 commented Jun 15, 2020

The patch looks good to me. Thanks!

$root = $dom->createElementNS('http://www.w3.org/2000/xhtml', 'html');
$dom->appendChild($root);

$s = microtime(true);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps better use hrtime() instead?

@nikic
Copy link
Member

nikic commented Jun 15, 2020

Nice catch!

@beberlei
Copy link
Contributor Author

I have asked for feedback on the libxml list as well, hopefully we'll hear something: https://mail.gnome.org/archives/xml/2020-June/msg00000.html

@nielsdos
Copy link
Member

nielsdos commented Mar 5, 2024

I fixed this last year differently. This PR is no longer applicable.

@nielsdos nielsdos closed this Mar 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants