Skip to content

Fix GH-12143: Optimize round #12268

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

SakiTakamachi
Copy link
Member

@SakiTakamachi SakiTakamachi commented Sep 22, 2023

(edit)

As the policy regarding edge case determination for round() has been finalized in the next RFC, the contents of this pull request is adopted.
https://wiki.php.net/rfc/change_the_edge_case_of_round


This PR completely fixes #12143.

I am very sorry that I ended up changing about 40% of the changes that @TimWolla made, but this was the only way to solve the round() problem.

I completely removed "pre-round" and improved value comparison in php_round_helper.
For example, numbers such as 0.285 and 1.235 may not be treated as edge cases even if we use modf().

This is because they are treated internally by PHP as 0.28499999999999998 and 1.2350000000000001, respectively.
This does not mean that 0.28499999999999998 is correct and 0.285 is incorrect. Both are 3fd23d70a3d70a3d in IEEE754, only the decimal representation is different.

Therefore, the only way to truly determine the edge case is to generate an edge case value and compare it to the passed value.

Regards.


Click here for more information about round.

https://wiki.php.net/rfc/rounding


Main points

The discussion is scattered all over the place and difficult to understand, so I will summarize the main points.

There were two problems with round

  1. Values ​​such as round(0.49999999999999994, 0), where "adding 0.5 will carry up due to error'' are incorrectly rounded.
  2. "pre-round" incorrectly rounds values ​​like round(1.70000000000145, 13)

1 was already fixed and I was trying to fix 2.

Learn more about problem 2

"pre-round" performs the following calculations.

round(1.70000000000145, 13)

1.70000000000145
17000000000014.5 // digit adjustment
17000000000015 // pre-round
1700000000001.5 // digit adjustment
1700000000002 // round
1.700000000002 // return

Wrongly rounded due to "pre-round".

Why is "pre-round" necessary?

The need for "pre-round" is summarized in the following article, "Analysis of the problems of the previous round() implementation" and "Pre-rounding to the value's precision if possible".

https://wiki.php.net/rfc/rounding

Simply put, "pre-round" was needed to solve the kind of problem where the result of round(0.285, 2) is 0.28.

About my changes

Removed "pre-round" and used helpers to accurately determine edge cases.

This allows us to correctly determine the use cases covered by "pre-round", while eliminating errors caused by "pre-round".

@TimWolla
Copy link
Member

This is because they are treated internally by PHP as 0.28499999999999998 and 1.2350000000000001, respectively.
This does not mean that 0.28499999999999998 is correct and 0.285 is incorrect. Both are 3fd23d70a3d70a3d in IEEE754, only the decimal representation is different.

I would disagree with that. 0.285 is not a valid IEEE-754 double precision floating point number and transformed into 0.28499999999999998 when parsing the code. Thus the expectation that it would be rounded to 0.29 would be incorrect and not an issue with the rounding functionality. The rounding functionality should only concern itself with numbers that are representable as IEEE-754 double precision floating point numbers, because otherwise we have the problem that users that intentionally want to work with 0.28499999999999998 instead of 0.285 would see incorrect rounding.

You can't satisfy everyone and thus correctly rounding the internal representation seems to be the right choice.

@SakiTakamachi
Copy link
Member Author

SakiTakamachi commented Sep 22, 2023

@TimWolla

I understand your opinion. I was also quite worried.

I look at implementations in other languages, the behavior is different, such as Ruby is 0.29 and Python is 0.28.
(edit: Python rounds to even numbers, so it was an inappropriate comparison target. )

The reason I chose 0.29 is that this is exactly the value used in the round() test.

https://github.com/php/php-src/blob/master/ext/standard/tests/math/round_prerounding.phpt

So this is a debate about whether we should change the clearly intended behavior of round().


I don't have any strong opinions, but this PR is the result of fixing bugs while maintaining the current php specifications.

@SakiTakamachi
Copy link
Member Author

SakiTakamachi commented Sep 22, 2023

I'm concerned about complicating the functionality, but I have an idea to create a new third argument to select the mode for this problem we're talking about.


Or bitmask.

@SakiTakamachi
Copy link
Member Author

SakiTakamachi commented Sep 22, 2023

I don't think of floating point numbers as "things that have the only correct value," but rather as "values ​​with a range," which include some degree of fluctuation due to errors.

Therefore, I thought that if the value in IEEE754 is the same, it should be treated as the same value even if the decimal representation is different.

If we follow this idea, the range of values ​​3fd23d70a3d70a3d includes 0.285, so this is treated as an edge case.

This is my personal opinion, and I have no intention of forcing it without discussion.

@TimWolla
Copy link
Member

(edit: Python rounds to even numbers, so it was an inappropriate comparison target. )

FWIW, the Python documentation notes:

Note The behavior of round() for floats can be surprising: for example, round(2.675, 2) gives 2.67 instead of the expected 2.68. This is not a bug: it’s a result of the fact that most decimal fractions can’t be represented exactly as a float. See Floating Point Arithmetic: Issues and Limitations for more information.

2.675 is 2.6749999999999998 in reality, so that explains why Python rounds this to 2.67. Interestingly PHP scales this by multiplying with 100, resulting in 267.5. I wonder what Python does there internally to not lose precision when rounding to a given number of places.

@SakiTakamachi
Copy link
Member Author

I see.

I understand the philosophy of Python.

@SakiTakamachi
Copy link
Member Author

SakiTakamachi commented Sep 23, 2023

@Girgias

I would like to ask for a review, please.

@Girgias
Copy link
Member

Girgias commented Sep 23, 2023

(edit: Python rounds to even numbers, so it was an inappropriate comparison target. )

FWIW, the Python documentation notes:

Note The behavior of round() for floats can be surprising: for example, round(2.675, 2) gives 2.67 instead of the expected 2.68. This is not a bug: it’s a result of the fact that most decimal fractions can’t be represented exactly as a float. See Floating Point Arithmetic: Issues and Limitations for more information.

2.675 is 2.6749999999999998 in reality, so that explains why Python rounds this to 2.67. Interestingly PHP scales this by multiplying with 100, resulting in 267.5. I wonder what Python does there internally to not lose precision when rounding to a given number of places.

The Python implementation seems to be located around: https://github.com/python/cpython/blob/3.12/Python/bltinmodule.c#L2357

@SakiTakamachi
Copy link
Member Author

To make the purpose of this PR easier to understand, I have added key points to the PR description.

@TimWolla
Copy link
Member

TimWolla commented Sep 23, 2023

Simply put, "pre-round" was needed to solve the kind of problem where the result of "round(0.285, 2)" is "0.28".

But that entire premise is flawed. As per:

#include <stdio.h>
#include <math.h>

int
main(void) {
	printf("%.17g\n", 0.28499999999999994);
	printf("%.17g\n", 0.28499999999999995);
	printf("%.17g\n", 0.28499999999999996);
	printf("%.17g\n", 0.28499999999999997);
	printf("%.17g\n", 0.28499999999999998);
	printf("%.17g\n", 0.28499999999999999);
	printf("%.17g\n", 0.28500000000000000);
	printf("%.17g\n", 0.28500000000000001);
}

All numbers from 0.28499999999999995 to 0.28500000000000000 have the same internal representation. Most of them are smaller than 0.285, thus treating them as equal to 0.285 doesn't really make sense. Instead it should be treated as the value in the middle to minimize the error and indeed the %.17g representation is 0.28499999999999998 which is in the middle.

Rounding 0.285 to anything other than 0.28 (or whatever the nearest representation is) would IMO be incorrect.

@SakiTakamachi
Copy link
Member Author

SakiTakamachi commented Sep 23, 2023

@TimWolla

Thank you for your detailed speculation. It's very easy to understand.

I would like to hear other people's opinions on this matter.

This is because there is no clearly defined correct answer to this problem, and this is equivalent to the act of "determining language specifications".


If the "premise" are correct, I think my changes completely solve the problem.

In other words, the remaining debate is whether the premise is correct.


However, I feel that an RFC is necessary in order to revoke something determined by an RFC.

@SakiTakamachi
Copy link
Member Author

By the way, the referenced article touches on this issue as follows:

Of course, one may argue that pre-rounding is not necessary and that this is simply the problem with FP arithmetics. This is true on the one hand, but the introduction of the places parameter made it clear that round() is to operate as if the numbers were stored as decimals. We can't revert that and this seems to me to be the best solutions for FP numbers one can get.

@Girgias
Copy link
Member

Girgias commented Sep 23, 2023

I've spent way too much time looking into rounding and how it works and trying to come up with some straight forward solutions.

But I agree with Tim here, FP numbers are FP numbers, and I don't understand why round() tries to act as if they are rational numbers. They are not. Also, the linked document was written in 2008, FP controls are a C99 standard (although compilers still seem to do whatever they want) which is what php-src now uses, so maybe that's something we should consider again.

This whole code is kinda bonkers, I don't know if there is a reasonable way to extract the fractional part as a 64bit integer. But if yes, this might make the most sense to do and work with integers.

@SakiTakamachi
Copy link
Member Author

SakiTakamachi commented Sep 24, 2023

Thank you.

If we were to take Tim's suggestion, I would be reverting the helper changes and fixing some test expectations.

This whole code is kinda bonkers, I don't know if there is a reasonable way to extract the fractional part as a 64bit integer. But if yes, this might make the most sense to do and work with integers.

Will it be a problem in a 32-bit environment?

(edit) Oops, there are no problem if we use int64_t.

(edit2)

If we convert it to an integer type and then process it, it may not work as Tim suggested.

var_dump((int) (0.285 * 1000));
// 285

If you're talking about my changes, you might be right.

tmp_value = value / f1;
}
/* This value is beyond our precision, so rounding it is pointless */
if (fabs(tmp_value) >= 1e15) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way, to increase the number of digits that can be processed, you can simply change this to 1e16.

@SakiTakamachi
Copy link
Member Author

I just followed the existing specifications, so I don't have any strong claims. Since the opinions of @TimWolla and @Girgias are in agreement, I have no objection to changing the specifications.

Is it okay if I continue to make changes to the specifications without preparing an RFC?

@SakiTakamachi
Copy link
Member Author

Make this a draft and delete it once #12291 is merged.

@SakiTakamachi SakiTakamachi marked this pull request as draft September 24, 2023 13:03
@derickr
Copy link
Member

derickr commented Sep 25, 2023

But I agree with Tim here, FP numbers are FP numbers, and I don't understand why round() tries to act as if they are rational numbers.

Because that is what people that use the language expect.

If they write round(0.285, 2), and they get 0.28, they very much could consider that to be a bug.

@Girgias
Copy link
Member

Girgias commented Sep 26, 2023

But I agree with Tim here, FP numbers are FP numbers, and I don't understand why round() tries to act as if they are rational numbers.

Because that is what people that use the language expect.

If they write round(0.285, 2), and they get 0.28, they very much could consider that to be a bug.

How is this different to a user expecting 0.1 + 0.2 === 0.3 to be true, when in reality it is false? As users could also consider this a bug.

Also, how is $f2 = 0.284999999999999698019; echo round($f2, 2), "\n"; rounding to 0.29 not a bug? This is several floating points below the representation chosen for 0.285, and is a floating point number representable exactly.

If we want to provide accurate decimalrational numbers for users, then let's actually add such a type.
But lying and fudging floating point numbers in ways that are not documented to the point that I now need to question if implementing numerical algorithms will actually give me correct results in PHP is a massive issue.

@SakiTakamachi
Copy link
Member Author

SakiTakamachi commented Sep 26, 2023

How is this different to a user expecting 0.1 + 0.2 === 0.3 to be true, when in reality it is false? As users could also consider this a bug.

Admittedly, I was also concerned about that.

This is similar to floor(0.99999999999999995) being 1.(Although the direction is opposite)


It seems like the argument is whether or not to consider that the value 0.285 does not exist in FP.

@Girgias
Copy link
Member

Girgias commented Sep 26, 2023

How is this different to a user expecting 0.1 + 0.2 === 0.3 to be true, when in reality it is false? As users could also consider this a bug.

Admittedly, I was also concerned about that.

This is similar to floor(0.99999999999999995) being 1.(Although the direction is opposite)

It seems like the argument is whether or not to consider that the value 0.285 does not exist in FP.

There has been since forever a massive warning about FP numbers in the docs: https://www.php.net/manual/en/language.types.float.php

This is a known issue with floating point numbers. And I don't see why round() should make an exception out of this.

If people need arbitrary precision, then they should use the BCMath extension.

@SakiTakamachi
Copy link
Member Author

SakiTakamachi commented Sep 26, 2023

If BC Math had something like bcround() for example, it might be a good solution to satisfy the user's request to round 0.285 to 0.29.

Admittedly, in the current situation, we are seeking round() to play the role that BC Math should have, and it is undeniable that it is a little distorted.

@SakiTakamachi
Copy link
Member Author

<?php
$start = microtime(true);
$n = 0.28499999999999995;
for ($i = 0; $i < 100000; $i++) {
    round(0.28499999999999995, 10, PHP_ROUND_TOWARD_ZERO);
    $n += 0.00000001;
}
var_dump(microtime(true) - $start);

before (4 times):

float(0.0359339714050293)
float(0.03636503219604492)
float(0.03540921211242676)
float(0.03581380844116211)

after (4 times):

float(0.030965805053710938)
float(0.02510809898376465)
float(0.030919790267944336)
float(0.032182931900024414)

@SakiTakamachi SakiTakamachi force-pushed the fix/gh-12143-optimize-round branch 4 times, most recently from 24e7083 to 82ea718 Compare January 30, 2024 13:11
@SakiTakamachi
Copy link
Member Author

Travis is slow... Only Travis has not passed the test yet, so I would like to wait for it to turn green if possible.

@SakiTakamachi SakiTakamachi force-pushed the fix/gh-12143-optimize-round branch 2 times, most recently from feeb2fb to 9a66d69 Compare January 30, 2024 13:33
@SakiTakamachi
Copy link
Member Author

SakiTakamachi commented Jan 30, 2024

When I consider things like mainframes, there's a lot more to think about...

(edit)

I may be able to do it with fesetround(int round) of <fenv.h> etc.

@SakiTakamachi SakiTakamachi force-pushed the fix/gh-12143-optimize-round branch 2 times, most recently from 21e7898 to 1de4149 Compare January 30, 2024 23:52
@SakiTakamachi
Copy link
Member Author

Oh, it worked!

@SakiTakamachi SakiTakamachi requested a review from bukka January 31, 2024 00:44
@SakiTakamachi SakiTakamachi force-pushed the fix/gh-12143-optimize-round branch 3 times, most recently from edce2fd to ae63dc0 Compare January 31, 2024 09:40
@SakiTakamachi SakiTakamachi force-pushed the fix/gh-12143-optimize-round branch from ae63dc0 to b9d4f3c Compare January 31, 2024 09:44
@SakiTakamachi
Copy link
Member Author

I fixed a few things that bothered me, and all the changes were completed.

Comment on lines 50 to 63
#define PHP_ROUND_BASIC_EDGE_CASE() do {\
if (places > 0) {\
edge_case = fabs((integral + copysign(0.5, integral)) / exponent);\
} else {\
edge_case = fabs((integral + copysign(0.5, integral)) * exponent);\
}\
} while (0)
#define PHP_ROUND_ZERO_EDGE_CASE() do {\
if (places > 0) {\
edge_case = fabs((integral) / exponent);\
} else {\
edge_case = fabs((integral) * exponent);\
}\
} while (0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a fan of those sort of macros because they make the code confusing.

integral and exponent should be arguments to the macro at minimum, and I'd prefer if edge case was returned.

Also, why not have these as inline functions (possibly marked with zend_always_inline) as this would be IMHO clearer.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, that makes sense. I'll fix it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed those!

Copy link
Member

@Girgias Girgias left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'm happy now :)


/* {{{ php_round_helper
Actually performs the rounding of a value to integer in a certain mode */
static inline double php_round_helper(double integral, double value, double exponent, int places, int mode) {
static inline double php_round_helper(double integral, const double value, const double exponent, const int places, const int mode) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const for non pointer parameters doesn't do anything as far as I know.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't even know why I decided to do this... I might have been half asleep...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fixed it!

@SakiTakamachi
Copy link
Member Author

I'll wait a little longer and if it looks okay, I'll merge it.

Copy link
Member

@bukka bukka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work

@SakiTakamachi SakiTakamachi deleted the fix/gh-12143-optimize-round branch February 3, 2024 13:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Incorrect round($num, 0, PHP_ROUND_HALF_UP) result for $num = 1.4999999999999998 / 4503599627370495.5
5 participants