-
Notifications
You must be signed in to change notification settings - Fork 7.8k
Add Randomizer::nextFloat() and Randomizer::getFloat() #9679
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
e895213
to
2cdcc30
Compare
495eb9f
to
270186f
Compare
The algorithm is published in: Drawing Random Floating-Point Numbers from an Interval. Frédéric Goualard, ACM Trans. Model. Comput. Simul., 32:3, 2022. https://doi.org/10.1145/3503512
6fe606a
to
a323cb0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
ext/random/randomizer.c
Outdated
bounds_name = Z_STR_P(case_name); | ||
} | ||
|
||
if (zend_string_equals_literal(bounds_name, "ClosedOpen")) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need to compare the names here, or is there maybe a more performant option? cc @iluuu1994
If we need to compare the names, it might make sense to have a switch dispatching on the first character ('C' or 'O') in advance. Or possible even two switches, assuming the enum won't change, in which case we wouldn't even need to compare strings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently no. Maybe we can fetch constants by offset like we do for properties (OBJ_PROP_NUM()
macro), and then do an identity check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cmb69 It's safe to assume the enum won't change, as that would also be a breaking change to userland (and really, the possible interval boundaries are already exhaustively enumerated). The solution is not pretty, but should work reliably. It also nicely avoids an allocation for the default value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@TimWolla I think you may be able to access the constants table by offset (since it's a HashMap which are ordered), assuming internal enums initialize all constants in Minit which I think they do. In that case you could then compare the enum object by pointer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@iluuu1994 I'm not sure how I would do that; in any case it sounds more fragile than my current solution of building a simple "hash", because it would be sensitive to reordering (instead of just renaming / extending)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because it would be sensitive to reordering (instead of just renaming / extending)?
I don't think we're really at risk for either of those since we have tests for these cases 🙂 Anyway, we haven't established a standard here, but generally I think going forward we should have something better than string comparison. Maybe we could make the stubs generator generate macros for the case offsets so this becomes less fragile.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since we have tests for these cases
The tests can't really distinguish between the cases, because the difference only matters for (literal!) edge cases and by the nature of randomness you can't reliably hit those. So in this case at least we need to rely on careful review and making the code resilient against accidental changes.
Anyway, we haven't established a standard here
Yeah, it's the first native enum. I tend to hit those “first times” with my RFCs, #[SensitiveParameter]
was the first parameter attribute and also required some general fixes for memory leaks in parameter attributes …
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kocsismate WDYT about generating offsets for enum cases? A ENUM_CASE_NUM(ce, NAME_CASE_OFFSET)
doesn't sound too bad and would avoid string comparison just to match an enum.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my current solution of building a simple "hash"
Interesting solution! :)
Maybe we could make the stubs generator generate macros for the case offsets so this becomes less fragile.
@kocsismate, thoughts?
{ | ||
double g = gamma_max(min, max); | ||
uint64_t hi = ceilint(min, max, g); | ||
uint64_t k = 1 + php_random_range64(algo, status, hi - 1); /* [1, hi] */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do these need exception handling since php_random_range64
can throw, or does 0
work out acceptably in the math and then the error will still be set in the global state?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer that these APIs didn't throw, to be honest -- I like it when the C APIs can be used basically like regular C code without Zend Engine side effects getting in the way, but I'm not going to block the PR on it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or does 0 work out acceptably in the math and then the error will still be set in the global state?
0 is a valid return value that may be returned by php_random_range64 in the happy path, so the result will be well-defined in the error path (even if not particularly random).
I would prefer that these APIs didn't throw,
Depending on the algo (“the engine”) these functions may call into userland, thus needing to deal with Exceptions is unavoidable:
<?php
final class ThrowingEngine implements Random\Engine {
public function generate(): string
{
throw new \Exception('test');
}
}
$r = new Random\Randomizer(new ThrowingEngine());
$r->getFloat(1, 2);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To clarify, though: php_random_range64
will not throw, unless either the underlying algo throws, or the underlying algo is so horribly biased that it may not be called random (a number is rejected with < 50% chance, 50 rejections thus have a probability of < 2-50 which is less than 1 in a quadrillion).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO, this is good to be merged (assuming CI goes green); we could still change the enum case detection if we get something generally better.
@cmb69 I've just added validation to reject non-finite input values (i.e. NAN and INF), because they don't work correctly. Perhaps have another look at the error message for those. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! The handling of infinite values looks good to me.
Thank you both. Will add the NEWS/UPGRADING later and then merge. |
Friends, I would like to open a discussion on the very unfortunately named method $randomizer = new \Random\Randomizer();
$x = $randomizer->getFloat(100, 200); // number between 100..200
$y = $randomizer->nextFloat(); // another number between 100..200 ???
I would have expected another number in the interval 100..200, but the result is a number in the interval 0..1. What is the meaning of (I noticed that there is already a similar method |
@dg This would be better raised either on the Internals list, or in a dedicated issue, instead of a merged PR. For PHP 8.3 it most certainly is too late.
That's not possible, because either both or none would need to be provided and PHP does not allow representing such a method signature without dirty hacks that are disallowed since https://wiki.php.net/rfc/deprecate_functions_with_overloaded_signatures#policy_about_new_functions
|
Until the stable version is released is the right time to rename it, it would be too late then :-) So I will create a separate PR. |
RFC: https://wiki.php.net/rfc/randomizer_additions