Skip to content

Fix GH-12192: SimpleXML infinite loop when getName() is called within foreach #12193

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

nielsdos
Copy link
Member

This happens because getName() resets the iterator to the start because it overwrites the iterator data.
We add a version of get_first_node that does not overwrite the iterator data.

…hin foreach

This happens because getName() resets the iterator to the start because
it overwrites the iterator data.
We add a version of get_first_node that does not overwrite the iterator
data.
Copy link
Member

@Girgias Girgias left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why the iterator is overridden in the first place tho.

@nielsdos
Copy link
Member Author

nielsdos commented Sep 13, 2023

Thanks for the review. Unfortunately, I found that this patch is incomplete. I have pushed an additional commit that changes the test and fixes the code. It turns out that I missed a place where the iterator data was thrown away.

I don't understand why the iterator is overridden in the first place tho.

SimpleXMLElement objects (called sxe for short) have the behaviour of an iterator. For example, when we write $xml->a we created an iterator for all <a> elements that are direct children of the root node. Note, this isn't an iterator in the Zend sense, it is only conceptually an iterator. This is implemented by storing three thins in the sxe:

  • sxe->node = the root node
  • Iterator type == element (i.e. only consider elements within sxe->node)
  • Iterator name == a (i.e. only consider nodes whose name equals a).

$xml itself has the following data in its sxe:

  • sxe->node = the root node
  • Iterator type == none (i.e. the iterator only returns what's inside sxe->node, i.e. the root node in this case)
  • Iterator name not applicable

As a side note, I prefer the term "view" instead of iterator, because the fields are clearly used as a filtering mechanism to create a specific way to "view" a node.

The internal functions of ext/simplexml all use an iterator-like API. Consequently, they are implemented in such a way that they overwrite the iterator data. As the iterator data is also used for the current item in a foreach loop, this may cause trouble as we see in this PR. So we must be careful by using the iterator functions, which is why I introduced non-destructive variants for some.

So why does $xml->a->getName() use the internal iterator API? Well, we can't use sxe->node->name because that's the name of the root in our example. By calling php_sxe_get_first_node() we get the first node that would be returned by a foreach loop. So for $xml this will return the root node, but for $xml->a this will return the first <a> element.
(Side note: this getName() implementation works fine with name filters of course, but the behaviour is a bit dubious with multiple different elements without a name filter: https://3v4l.org/3UE09 Although it is the case that other simplexml functiond also use the first element so it is somewhat consistent internally.)

@nielsdos nielsdos requested a review from Girgias September 13, 2023 21:12
@nielsdos
Copy link
Member Author

It's also worth noting that this behaviour differs now:

<?php
$xml = "<root><a>1</a><a>2</a></root>";
$xml = simplexml_load_string($xml);

$a = $xml->a;
var_dump($a->getName());

var_dump($a->current());

Previously, current() returned the first <a> element because getName() implicitly rewinded the iterator by resetting its data (from undef to the first element). Now it throws an exception because the iterator is not set.
I can work around this by adding code that does set the iterator data if it was undef before, but I'm not so sure that's desirable.

@nielsdos nielsdos self-assigned this Sep 16, 2023
@nielsdos nielsdos closed this in 4d888cf Sep 16, 2023
nielsdos added a commit that referenced this pull request Sep 30, 2023
Many methods in SimpleXML reset the iterator when called. This has the
consequence that mixing these operations with loops can cause infinite
loops, or the loss of iteration data.
Some people may however rely on the resetting behaviour. To prevent
unintended breaks in stable branches, let's only apply the fix to master.

This reverts GH-12193, GH-12229, GG-12247 for stable branches while
keeping them on master, adding a note in UPGRADING as well.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

SimpleXML infinite loop when getName() is called within foreach
2 participants