Skip to content

Add Randomizer::nextFloat() and Randomizer::getFloat() #9679

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 14 commits into from
Dec 14, 2022

Conversation

TimWolla
Copy link
Member

@TimWolla TimWolla commented Oct 5, 2022

@TimWolla TimWolla added the RFC label Oct 10, 2022
@TimWolla TimWolla changed the title Add Randomizer::nextFloat() Add Randomizer::nextFloat() and Randomizer::getFloat() Oct 12, 2022
@TimWolla TimWolla force-pushed the randomizer-nextFloat branch from e895213 to 2cdcc30 Compare November 27, 2022 18:13
@TimWolla TimWolla force-pushed the randomizer-nextFloat branch 2 times, most recently from 495eb9f to 270186f Compare December 9, 2022 18:36
@TimWolla TimWolla force-pushed the randomizer-nextFloat branch from 6fe606a to a323cb0 Compare December 9, 2022 20:21
@TimWolla TimWolla requested a review from cmb69 December 9, 2022 20:38
@TimWolla TimWolla marked this pull request as ready for review December 9, 2022 23:14
Copy link
Member

@cmb69 cmb69 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

bounds_name = Z_STR_P(case_name);
}

if (zend_string_equals_literal(bounds_name, "ClosedOpen")) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need to compare the names here, or is there maybe a more performant option? cc @iluuu1994

If we need to compare the names, it might make sense to have a switch dispatching on the first character ('C' or 'O') in advance. Or possible even two switches, assuming the enum won't change, in which case we wouldn't even need to compare strings.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently no. Maybe we can fetch constants by offset like we do for properties (OBJ_PROP_NUM() macro), and then do an identity check?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cmb69 It's safe to assume the enum won't change, as that would also be a breaking change to userland (and really, the possible interval boundaries are already exhaustively enumerated). The solution is not pretty, but should work reliably. It also nicely avoids an allocation for the default value.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TimWolla I think you may be able to access the constants table by offset (since it's a HashMap which are ordered), assuming internal enums initialize all constants in Minit which I think they do. In that case you could then compare the enum object by pointer.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@iluuu1994 I'm not sure how I would do that; in any case it sounds more fragile than my current solution of building a simple "hash", because it would be sensitive to reordering (instead of just renaming / extending)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because it would be sensitive to reordering (instead of just renaming / extending)?

I don't think we're really at risk for either of those since we have tests for these cases 🙂 Anyway, we haven't established a standard here, but generally I think going forward we should have something better than string comparison. Maybe we could make the stubs generator generate macros for the case offsets so this becomes less fragile.

Copy link
Member Author

@TimWolla TimWolla Dec 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since we have tests for these cases

The tests can't really distinguish between the cases, because the difference only matters for (literal!) edge cases and by the nature of randomness you can't reliably hit those. So in this case at least we need to rely on careful review and making the code resilient against accidental changes.

Anyway, we haven't established a standard here

Yeah, it's the first native enum. I tend to hit those “first times” with my RFCs, #[SensitiveParameter] was the first parameter attribute and also required some general fixes for memory leaks in parameter attributes …

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kocsismate WDYT about generating offsets for enum cases? A ENUM_CASE_NUM(ce, NAME_CASE_OFFSET) doesn't sound too bad and would avoid string comparison just to match an enum.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my current solution of building a simple "hash"

Interesting solution! :)

Maybe we could make the stubs generator generate macros for the case offsets so this becomes less fragile.

@kocsismate, thoughts?

{
double g = gamma_max(min, max);
uint64_t hi = ceilint(min, max, g);
uint64_t k = 1 + php_random_range64(algo, status, hi - 1); /* [1, hi] */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do these need exception handling since php_random_range64 can throw, or does 0 work out acceptably in the math and then the error will still be set in the global state?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer that these APIs didn't throw, to be honest -- I like it when the C APIs can be used basically like regular C code without Zend Engine side effects getting in the way, but I'm not going to block the PR on it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@morrisonlevi

or does 0 work out acceptably in the math and then the error will still be set in the global state?

0 is a valid return value that may be returned by php_random_range64 in the happy path, so the result will be well-defined in the error path (even if not particularly random).

I would prefer that these APIs didn't throw,

Depending on the algo (“the engine”) these functions may call into userland, thus needing to deal with Exceptions is unavoidable:

<?php
final class ThrowingEngine implements Random\Engine {
	public function generate(): string
	{
		throw new \Exception('test');
	}
}
$r = new Random\Randomizer(new ThrowingEngine());

$r->getFloat(1, 2);

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify, though: php_random_range64 will not throw, unless either the underlying algo throws, or the underlying algo is so horribly biased that it may not be called random (a number is rejected with < 50% chance, 50 rejections thus have a probability of < 2-50 which is less than 1 in a quadrillion).

Copy link
Member

@cmb69 cmb69 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, this is good to be merged (assuming CI goes green); we could still change the enum case detection if we get something generally better.

@TimWolla TimWolla requested a review from zeriyoshi December 12, 2022 17:49
@TimWolla
Copy link
Member Author

@cmb69 I've just added validation to reject non-finite input values (i.e. NAN and INF), because they don't work correctly. Perhaps have another look at the error message for those.

Copy link
Member

@cmb69 cmb69 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! The handling of infinite values looks good to me.

@TimWolla
Copy link
Member Author

Thank you both. Will add the NEWS/UPGRADING later and then merge.

@TimWolla TimWolla merged commit f9a1a90 into php:master Dec 14, 2022
@TimWolla TimWolla deleted the randomizer-nextFloat branch January 15, 2023 15:48
@dg
Copy link

dg commented Oct 13, 2023

Friends, I would like to open a discussion on the very unfortunately named method nextFloat(). What most programmers will probably expect it returns in this example?

$randomizer = new \Random\Randomizer();
$x = $randomizer->getFloat(100, 200); // number between 100..200
$y = $randomizer->nextFloat(); // another number between 100..200 ???

I would have expected another number in the interval 100..200, but the result is a number in the interval 0..1.

What is the meaning of nextFloat()? Isn't it better to just have only a getFloat() function and give the $min and $max parameters default values?

(I noticed that there is already a similar method nextInt(). I suggest to create better named alias for this method and not to extend problem by adding another badly named method.)

@TimWolla
Copy link
Member Author

TimWolla commented Oct 14, 2023

@dg This would be better raised either on the Internals list, or in a dedicated issue, instead of a merged PR. For PHP 8.3 it most certainly is too late.

Isn't it better to just have only a getFloat() function and give the $min and $max parameters default values?

That's not possible, because either both or none would need to be provided and PHP does not allow representing such a method signature without dirty hacks that are disallowed since https://wiki.php.net/rfc/deprecate_functions_with_overloaded_signatures#policy_about_new_functions

(I noticed that there is already a similar method nextInt(). I suggest to create better named alias for this method and not to extend problem by adding another badly named method.)

nextInt() is extra terrible and only exists for drop-in compatibility with mt_rand() without parameters. If anything it should be deprecated without replacement. nextFloat() has a well-defined and useful behavior at least.

@dg
Copy link

dg commented Oct 14, 2023

Until the stable version is released is the right time to rename it, it would be too late then :-) So I will create a separate PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants