-
Notifications
You must be signed in to change notification settings - Fork 7.8k
Add Randomizer::getBytesFromString()
method
#9664
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I've suggested something like this myself, this is a useful addition to me. I'm not sure if this requires an RFC if zeriyoshi is also happy, as this is a self-contained addition, an email to Internals to bikeshed the method name might be useful, though.
ext/random/tests/03_randomizer/methods/getBytesFromAlphabet.phpt
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree about adding useful utility methods to the Randomizer. However, I think an RFC is needed. The reason I did not adopt Tim's proposal in the final RFC is that I wanted to solve the current global scope problem quickly :)
In this case we should think of additional methods to include, so voters don't need to deal with several RFCs. Personally I'd like to add a |
162400f
to
397050c
Compare
I've created a PoC implementation for nextFloat in #9679. @joshuaruesweg Are you interested in pursuing the RFC process for this? I'll be happy to co-author to include the |
Yes, I can gladly take on the RFC process and will mention the |
Thanks. About semantics and behavior: "alphabet" usually means a set (like "charset"), where each symbol (letter/character) is unique; but the current implementation allows duplicate bytes (and doesn't merge them), e.g. |
Yes, I believe this is more useful. Personally I would also expect that the function just uses the input alphabet as-is without performing any preprocessing. Needing to scan for duplicates would make the function much more complex (and potentially much slower) for no real gain.
Yes, I would expect that to be part of the user documentation.
Do you have any suggestions? |
This comment was marked as resolved.
This comment was marked as resolved.
9d1ccc0
to
b56f3f7
Compare
Randomizer::getBytesFromAlphabet()
methodRandomizer::getBytesFromString()
method
@joshuaruesweg Can you please rebase your branch onto the latest master to make sure the CI includes all the recent changes in ext/random? |
b56f3f7
to
679dba1
Compare
679dba1
to
0baf646
Compare
@zeriyoshi @TimWolla Sorry for the email spam. I cannot request more than one review from one person. If I request the review from another person, the request of the person whose rating is still pending will be deleted and the other person will be requested. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
To keep the diff cleaner for future changes, such as #9664.
uint64_t mask; | ||
if (source_length <= 0x1) { | ||
mask = 0x0; | ||
} else if (source_length <= 0x2) { | ||
mask = 0x1; | ||
} else if (source_length <= 0x4) { | ||
mask = 0x3; | ||
} else if (source_length <= 0x8) { | ||
mask = 0x7; | ||
} else if (source_length <= 0x10) { | ||
mask = 0xF; | ||
} else if (source_length <= 0x20) { | ||
mask = 0x1F; | ||
} else if (source_length <= 0x40) { | ||
mask = 0x3F; | ||
} else if (source_length <= 0x80) { | ||
mask = 0x7F; | ||
} else { | ||
mask = 0xFF; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, this isn't particularly performant. I wonder whether we should optimize that using __builtin_clz
(and _BitScanReverse
on Windows). We already have a few usages in php-src (see e.g. zend_mm_small_size_to_bit()
), and it may make sense to unify these (to better encapsulate the usage of the intrinsics).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cmb69 I noted the same at the end of my previous comment here: #9664 (comment).
While the current if-elseif-else-construction contains several hard-to-predict branches, the performance of it is likely to be dwarfed by the loop that actually selects the random bytes.
I'd likely optimize this in a separate PR when the need arises or when there is a good API for clz. zend_ulong_nlz
is not great, because it takes zend_ulong
which size differs depending on the platform.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
zend_ulong_nlz
is not great, because it takeszend_ulong
which size differs depending on the platform.
TIL I learned about that function. Anyhow, the sizeof zend_ulong
wouldn't matter here, since this branch is only taken for source_length < 256
. But yeah, we can still optimize that later.
@zeriyoshi Do you have any other comments? |
@TimWolla |
Thank you. I’ll add the NEWS/UPGRADING entries later and then merge. |
RFC: https://wiki.php.net/rfc/randomizer_additions
I have written a function that generates a random string based on an alphabet. This function is a useful building block. For example, it is possible to easily create random colour codes without having to program this in PHP (like many libraries do). It is also possible to create voucher codes that exclude 0s and Os to make the code more readable. Since the class into which this function is built is
final
anyway, the method can be built in with full backwards compatibility.This function is discussed earlier in the mailing list by @TimWolla: https://externals.io/message/117939#117963 and it is considered as a good extension to the random class.