-
-
Notifications
You must be signed in to change notification settings - Fork 31.7k
gh-132942: Fix races in type lookup cache #133032
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Two races related to the type lookup cache, when used in the free-threaded build. This caused test_opcache to sometimes fail (as well as other hard to re-produce failures).
Here is a script that triggers the crash. It can take a while, especially if running under "rr". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Thanks @nascheme for the PR 🌮🎉.. I'm working now to backport this PR to: 3.13. |
Sorry, @nascheme, I could not cleanly backport this to
|
Two races related to the type lookup cache, when used in the free-threaded build. This caused test_opcache to sometimes fail (as well as other hard to reproduce failures).
The first problem is that
find_name_in_mro()
can block on some mutex and then release critical sections. If that happens, the type version used for the cache entry can be wrong (too new). Assigning the version before doing the find fixes this issue. If it does race, you will add an entry that uses an out-of-date version.The second problem was much harder to track down. There is a hard to trigger race in
update_cache()
, writing to cache, and_PyType_LookupStackRefAndVersion()
, reading from cache. We use a sequence lock to avoid races. However, if the reader reads the old entry value and the new entry version, it will try to execute_Py_TryXGetStackRef()
on a stale cache entry value. If that value has been deallocated,PyStackRef_XCLOSE()
will crash. This could happen before because the version was written first and then new value second.The fix is simply to write the entry value first and the version after. That way, the reader always sees a value at least as new as the version.
Possible scenarios for the reader of the cache entry, as it is being written to concurrently: