Skip to content

[RFC] Define proper semantics for range() function #10826

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 22 commits into from
Jun 19, 2023

Conversation

Girgias
Copy link
Member

@Girgias Girgias commented Mar 10, 2023

RFC: To be written

range() currently has outright insane behaviour. It performs no type checking whatsoever, does not handle INF or NAN values properly, will type juggle strings to integers in situations which are unneeded or when it doesn't make any sense. And all of this fails silently.

This PR amends the behaviour of range() to TypeError or ValueError on edge cases (like NAN values) that make no sense whatsoever, and are unlikely to be used in the wild. In other weird cases an E_WARNING is now emitted with the intended purpose to be promoted to ValueError in PHP 9.

This PR does not change the behaviour of generating character arrays, which perform ASCII code point increments, therefore the peculiar behaviour of range('A', 'z'); remains unchanged with no warnings.

@Girgias Girgias changed the title [RFC] Saner range() semantics [RFC] Define proper semantics for range() function May 8, 2023
@Girgias Girgias force-pushed the range-str-inputs branch from c55fa01 to c001f55 Compare May 16, 2023 12:04
@Girgias Girgias requested a review from bukka as a code owner May 16, 2023 12:04
Comment on lines +2840 to +2928
if (start_type >= IS_STRING || end_type >= IS_STRING) {
/* If one of the inputs is NOT a string */
if (UNEXPECTED(start_type + end_type < 2*IS_STRING)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably fine for possible values (long, array, string), but can we make this explicit, like ((1 << start_type) | (1 << end_type)) != MAY_BE_STRING?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, but I don't think this would handle the case range('1', '9'); as both would be marked as IS_ARRAY

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why I marked three lines instead of two, I was mainly referring to the if in 2842.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either way, this still causes an issue if both inputs are string digits as start_type == IS_ARRAY and end_type == IS_ARRAY means that ((1 << start_type) | (1 << end_type)) will never be equal to MAY_BE_STRING.

I thought about using the MAY_BE_* constants and do some bit mask checks but found it simpler to just use a new value that is greater than IS_STRING. I can however tweak it if necessary.

@@ -0,0 +1,97 @@
--TEST--
Test range() function digits
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Test range() function digits
Test range() function with string digits

I would also add test cases with string numbers bigger than '9', i.e. '12'. That should be also clear what to expect from such parameters.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Such a test exists in range_inputs_numeric_basic.phpt

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've split them up into more files

Copy link
Contributor

@jorgsowa jorgsowa May 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the confusion. The description of the RFC is correct, however I meant test cases like range("12","A"). In your implementation it returns:

Warning: range(): Argument #1 ($start) must be a string if argument #2 ($end) is a string, argument #2 ($end) converted to 0 in Command line code on line 1
array(13) {
  [0]=>
  int(12)
  [1]=>
  int(11)
  [2]=>
  int(10)
  [3]=>
  int(9)
  [4]=>
  int(8)
  [5]=>
  int(7)
  [6]=>
  int(6)
  [7]=>
  int(5)
  [8]=>
  int(4)
  [9]=>
  int(3)
  [10]=>
  int(2)
  [11]=>
  int(1)
  [12]=>
  int(0)
}

The logic behind is arguable. I'm not sure exactly how it should behave for such case, but for sure the description is kind misleading. Because both of the arguments are string already.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

range() can only work with single byte strings. Thus the behaviour of trying to interpret the numeric string as an int is the correct one. I will amend the error message to precise "single byte strings"

@Girgias Girgias force-pushed the range-str-inputs branch from 03fe5ad to 331f11f Compare May 31, 2023 10:16
Girgias added 12 commits June 18, 2023 13:58
This is a BC break but currently passing arrays, objects, or resources doesn't throw as it should.
Add way more cases with strings to showcase the current madness.
Have a clear error message for this case, and eliminate a bunch of useless conditions as this is now pre-checked
This changes the values of e.g. range(1, 5, 2.0); from being floats to ints.
The start and end value are not necessarily the issue, it only is when the step parameter does to small increments
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants