#53655: Improve speed of DOMNode::C14N() on large XML documents #12278
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
https://bugs.php.net/bug.php?id=53655
The XPath query is in accordance to spec [1]. However, we can do it in a
simpler way. We can use a custom callback function instead of a linear
search in XPath to check if a node is visible. Note that comment nodes
are handled internally by libxml2 already, so we do not need to
differentiate between node types. The callback will do an upwards
traversal of the tree until the root of the canonicalization is reached.
In practice this will speed up the application a lot.
[1] https://www.w3.org/TR/2001/REC-xml-c14n-20010315 section 2.1
This can make processing easily 100 times faster for a large document. I generated some random XML documents with https://codebeautify.org/generate-random-xml: https://gist.github.com/nielsdos/369813d1b1c5c146a6fd7992b8ddbc28
file.xml: before -> after:
random.xml: 0.159s -> 0.004s
large.xml: 1.256s -> 0.008s
There's another speed-up I could do by replacing the linear search with a search in a HashTable, that's orthogonal to this but also a smaller time save. That's important for the cases that do use a nodeset. something to do as a follow-up probably.