-
Notifications
You must be signed in to change notification settings - Fork 7.8k
Metaphone performance improvement #10501
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
All inputs for ENCODE() are already uppercase, so there's no need to spend time uppercasing them again.
If it's a zero-terminator check, or an isalpha() check there's no need to convert it to uppercase first.
Looks fine but did you measure any improvement ? |
I did measure an improvement over a simple test case (executing the metaphone test in php 10K times) and the amount of instructions executed dropped from 851,037,969 to 699,677,969, so a 17.7% perf increase. |
We don't have to re-read letters, and re-uppercase them if we already did it once. By caching these results we gain performance. Furthermore, we can avoid fetching and uppercasing in some conditions by first checking what we already had: e.g. if a condition depends on both Prev_Letter and After_Next_Letter, but we already have Prev_Letter cached we can place that first to avoid a fetch+toupper of the "after next letter".
I pushed an additional performance improvement. Main ideas were to cache the read and uppercased result and in some cases reorder the condition in an if check to avoid fetching and uppercasing if the condition couldn't be fulfilled anyway. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
It may also make sense to implement Double Metaphone (released in 2000) or an old version of Metaphone 3 which exists under a BSD licence (https://github.com/OpenRefine/OpenRefine/blob/master/main/src/com/google/refine/clustering/binning/Metaphone3.java) to improve the accuracy of this function.
Will it be backported to 8.2? |
No. |
I saw a report here: https://gitlab.alpinelinux.org/alpine/aports/-/issues/14381#note_276471 that metaphone under Alpine was way slower than under Debian. Making less calls into libc should improve performance on both Alpine and other distros.