#53655: Improve speed of DOMNode::C14N() on large XML documents #12278

nielsdos · 2023-09-22T21:48:50Z

https://bugs.php.net/bug.php?id=53655

The XPath query is in accordance to spec [1]. However, we can do it in a
simpler way. We can use a custom callback function instead of a linear
search in XPath to check if a node is visible. Note that comment nodes
are handled internally by libxml2 already, so we do not need to
differentiate between node types. The callback will do an upwards
traversal of the tree until the root of the canonicalization is reached.
In practice this will speed up the application a lot.

[1] https://www.w3.org/TR/2001/REC-xml-c14n-20010315 section 2.1

This can make processing easily 100 times faster for a large document. I generated some random XML documents with https://codebeautify.org/generate-random-xml: https://gist.github.com/nielsdos/369813d1b1c5c146a6fd7992b8ddbc28

file.xml: before -> after:
random.xml: 0.159s -> 0.004s
large.xml: 1.256s -> 0.008s

There's another speed-up I could do by replacing the linear search with a search in a HashTable, that's orthogonal to this but also a smaller time save. That's important for the cases that do use a nodeset. something to do as a follow-up probably.

…ents The XPath query is in accordance to spec [1]. However, we can do it in a simpler way. We can use a custom callback function instead of a linear search in XPath to check if a node is visible. Note that comment nodes are handled internally by libxml2 already, so we do not need to differentiate between node types. The callback will do an upwards traversal of the tree until the root of the canonicalization is reached. In practice this will speed up the application a lot. [1] https://www.w3.org/TR/2001/REC-xml-c14n-20010315 section 2.1

Girgias

Looks sensible

nielsdos added 3 commits September 22, 2023 23:25

Remove unnecessary invalidation

4b73cd5

Add additional test for special cases for C14N

5691c5c

github-actions bot added the Extension: dom label Sep 22, 2023

[ci skip] UPGRADING

e9fb3b2

Girgias approved these changes Sep 23, 2023

View reviewed changes

nielsdos closed this in 5d68d61 Sep 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

#53655: Improve speed of DOMNode::C14N() on large XML documents #12278

#53655: Improve speed of DOMNode::C14N() on large XML documents #12278

nielsdos commented Sep 22, 2023 •

edited

Loading

Girgias left a comment

#53655: Improve speed of DOMNode::C14N() on large XML documents #12278

#53655: Improve speed of DOMNode::C14N() on large XML documents #12278

Conversation

nielsdos commented Sep 22, 2023 • edited Loading

Girgias left a comment

Choose a reason for hiding this comment

nielsdos commented Sep 22, 2023 •

edited

Loading