Skip to content

Add Randomizer::getBytesFromString() method #9664

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Dec 9, 2022

Conversation

joshuaruesweg
Copy link
Member

@joshuaruesweg joshuaruesweg commented Oct 3, 2022

RFC: https://wiki.php.net/rfc/randomizer_additions


I have written a function that generates a random string based on an alphabet. This function is a useful building block. For example, it is possible to easily create random colour codes without having to program this in PHP (like many libraries do). It is also possible to create voucher codes that exclude 0s and Os to make the code more readable. Since the class into which this function is built is final anyway, the method can be built in with full backwards compatibility.

This function is discussed earlier in the mailing list by @TimWolla: https://externals.io/message/117939#117963 and it is considered as a good extension to the random class.

Copy link
Member

@TimWolla TimWolla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I've suggested something like this myself, this is a useful addition to me. I'm not sure if this requires an RFC if zeriyoshi is also happy, as this is a self-contained addition, an email to Internals to bikeshed the method name might be useful, though.

@TimWolla TimWolla requested a review from zeriyoshi October 4, 2022 15:48
Copy link
Contributor

@zeriyoshi zeriyoshi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree about adding useful utility methods to the Randomizer. However, I think an RFC is needed. The reason I did not adopt Tim's proposal in the final RFC is that I wanted to solve the current global scope problem quickly :)

@TimWolla
Copy link
Member

TimWolla commented Oct 5, 2022

However, I think an RFC is needed.

In this case we should think of additional methods to include, so voters don't need to deal with several RFCs. Personally I'd like to add a getFloat() method that returns a float between 0 and 1, as that's non-trivial to do in userland and an equally useful building block as getBytesFromAlphabet().

@TimWolla
Copy link
Member

TimWolla commented Oct 5, 2022

I've created a PoC implementation for nextFloat in #9679.

@joshuaruesweg Are you interested in pursuing the RFC process for this? I'll be happy to co-author to include the nextFloat() proposal and whatever comes up during discussion. Here's some documentation on how to create a RFC: https://wiki.php.net/rfc/howto

@joshuaruesweg
Copy link
Member Author

Yes, I can gladly take on the RFC process and will mention the nextFloat()method, too :)

@guilliamxavier
Copy link
Contributor

Thanks. About semantics and behavior: "alphabet" usually means a set (like "charset"), where each symbol (letter/character) is unique; but the current implementation allows duplicate bytes (and doesn't merge them), e.g. "AAAB" (or "ABAA", or any other order) giving A a probability of 75% while B 25% (not 50% each). That's maybe more useful than disallowing (or merging) duplicates, but should at least be documented (if not choosing a different function/parameter name)?

@TimWolla
Copy link
Member

TimWolla commented Oct 7, 2022

@guilliamxavier

That's maybe more useful than disallowing (or merging) duplicates,

Yes, I believe this is more useful. Personally I would also expect that the function just uses the input alphabet as-is without performing any preprocessing. Needing to scan for duplicates would make the function much more complex (and potentially much slower) for no real gain.

but should at least be documented

Yes, I would expect that to be part of the user documentation.

if not choosing a different function/parameter name

Do you have any suggestions?

@TimWolla TimWolla added the RFC label Oct 10, 2022
@TimWolla

This comment was marked as resolved.

@joshuaruesweg joshuaruesweg changed the title Add Randomizer::getBytesFromAlphabet() method Add Randomizer::getBytesFromString() method Nov 7, 2022
@TimWolla TimWolla self-requested a review November 23, 2022 09:02
@TimWolla
Copy link
Member

@joshuaruesweg Can you please rebase your branch onto the latest master to make sure the CI includes all the recent changes in ext/random?

@joshuaruesweg joshuaruesweg requested review from TimWolla and zeriyoshi and removed request for TimWolla and zeriyoshi November 27, 2022 11:46
@joshuaruesweg joshuaruesweg requested review from TimWolla and removed request for zeriyoshi November 27, 2022 11:47
@joshuaruesweg
Copy link
Member Author

@zeriyoshi @TimWolla Sorry for the email spam.

I cannot request more than one review from one person. If I request the review from another person, the request of the person whose rating is still pending will be deleted and the other person will be requested.

@TimWolla TimWolla requested a review from zeriyoshi November 27, 2022 11:58
Copy link
Member

@TimWolla TimWolla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@TimWolla TimWolla requested a review from cmb69 November 27, 2022 12:00
TimWolla added a commit that referenced this pull request Nov 27, 2022
To keep the diff cleaner for future changes, such as #9664.
Comment on lines +300 to +319
uint64_t mask;
if (source_length <= 0x1) {
mask = 0x0;
} else if (source_length <= 0x2) {
mask = 0x1;
} else if (source_length <= 0x4) {
mask = 0x3;
} else if (source_length <= 0x8) {
mask = 0x7;
} else if (source_length <= 0x10) {
mask = 0xF;
} else if (source_length <= 0x20) {
mask = 0x1F;
} else if (source_length <= 0x40) {
mask = 0x3F;
} else if (source_length <= 0x80) {
mask = 0x7F;
} else {
mask = 0xFF;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, this isn't particularly performant. I wonder whether we should optimize that using __builtin_clz (and _BitScanReverse on Windows). We already have a few usages in php-src (see e.g. zend_mm_small_size_to_bit()), and it may make sense to unify these (to better encapsulate the usage of the intrinsics).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cmb69 I noted the same at the end of my previous comment here: #9664 (comment).

While the current if-elseif-else-construction contains several hard-to-predict branches, the performance of it is likely to be dwarfed by the loop that actually selects the random bytes.

I'd likely optimize this in a separate PR when the need arises or when there is a good API for clz. zend_ulong_nlz is not great, because it takes zend_ulong which size differs depending on the platform.

Copy link
Member

@cmb69 cmb69 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

zend_ulong_nlz is not great, because it takes zend_ulong which size differs depending on the platform.

TIL I learned about that function. Anyhow, the sizeof zend_ulong wouldn't matter here, since this branch is only taken for source_length < 256. But yeah, we can still optimize that later.

@TimWolla
Copy link
Member

TimWolla commented Dec 7, 2022

@zeriyoshi Do you have any other comments?

@zeriyoshi
Copy link
Contributor

@TimWolla
Nothing, Approved it!

@TimWolla
Copy link
Member

TimWolla commented Dec 9, 2022

Thank you. I’ll add the NEWS/UPGRADING entries later and then merge.

@TimWolla TimWolla merged commit ac3ecd0 into php:master Dec 9, 2022
@joshuaruesweg joshuaruesweg deleted the randomBytesFromString branch January 2, 2023 20:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants