gh-132942: Fix races in type lookup cache #133032

nascheme · 2025-04-27T00:43:00Z

Two races related to the type lookup cache, when used in the free-threaded build. This caused test_opcache to sometimes fail (as well as other hard to reproduce failures).

The first problem is that find_name_in_mro() can block on some mutex and then release critical sections. If that happens, the type version used for the cache entry can be wrong (too new). Assigning the version before doing the find fixes this issue. If it does race, you will add an entry that uses an out-of-date version.

The second problem was much harder to track down. There is a hard to trigger race in update_cache(), writing to cache, and _PyType_LookupStackRefAndVersion(), reading from cache. We use a sequence lock to avoid races. However, if the reader reads the old entry value and the new entry version, it will try to execute _Py_TryXGetStackRef() on a stale cache entry value. If that value has been deallocated, PyStackRef_XCLOSE() will crash. This could happen before because the version was written first and then new value second.

The fix is simply to write the entry value first and the version after. That way, the reader always sees a value at least as new as the version.

Possible scenarios for the reader of the cache entry, as it is being written to concurrently:

entry version	entry value	outcome
old	old	Okay, type version will not match
old	new	Okay, incref/decref works, seq check fails
new	old	Bad, incref/decref on old value might crash
new	new	Okay, incref/decref works, seq check fails

Issue: test_opcache fails randomly (failure or crash) #132942

Two races related to the type lookup cache, when used in the free-threaded build. This caused test_opcache to sometimes fail (as well as other hard to re-produce failures).

Objects/typeobject.c

nascheme · 2025-04-28T18:03:52Z

Here is a script that triggers the crash. It can take a while, especially if running under "rr".

crash_mro_lookup.py.txt

colesbury

LGTM

miss-islington-app · 2025-04-28T21:38:51Z

Thanks @nascheme for the PR 🌮🎉.. I'm working now to backport this PR to: 3.13.
🐍🍒⛏🤖

miss-islington-app · 2025-04-28T21:38:54Z

Sorry, @nascheme, I could not cleanly backport this to 3.13 due to a conflict.
Please backport using cherry_picker on command line.

cherry_picker 31d1342de9489f95384dbc748130c2ae6f092e84 3.13

pythongh-132942: Fix races in type lookup cache

90a7d35

Two races related to the type lookup cache, when used in the free-threaded build. This caused test_opcache to sometimes fail (as well as other hard to re-produce failures).

nascheme added type-crash A hard crash of the interpreter, possibly with a core dump topic-free-threading labels Apr 27, 2025

bedevere-app bot mentioned this pull request Apr 27, 2025

test_opcache fails randomly (failure or crash) #132942

Open

nascheme added the skip news label Apr 27, 2025

Add NEWS.

5cd03e8

nascheme requested a review from colesbury April 27, 2025 00:52

nascheme marked this pull request as ready for review April 27, 2025 01:38

nascheme requested a review from markshannon as a code owner April 27, 2025 01:38

bedevere-app bot added the awaiting core review label Apr 27, 2025

nascheme removed the skip news label Apr 27, 2025

colesbury reviewed Apr 28, 2025

View reviewed changes

Objects/typeobject.c Outdated Show resolved Hide resolved

nascheme added 2 commits April 28, 2025 11:13

Use release/acquire pair for entry->version.

6d30841

Merge 'origin/main' into pythongh-132942-tp-lookup-race

3e01796

colesbury approved these changes Apr 28, 2025

View reviewed changes

bedevere-app bot added awaiting merge and removed awaiting core review labels Apr 28, 2025

nascheme merged commit 31d1342 into python:main Apr 28, 2025
46 checks passed

bedevere-app bot removed the awaiting merge label Apr 28, 2025

nascheme added the needs backport to 3.13 bugs and security fixes label Apr 28, 2025

miss-islington-app bot assigned nascheme Apr 28, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gh-132942: Fix races in type lookup cache #133032

gh-132942: Fix races in type lookup cache #133032

nascheme commented Apr 27, 2025 •

edited

Loading

nascheme commented Apr 28, 2025

colesbury left a comment

miss-islington-app bot commented Apr 28, 2025

miss-islington-app bot commented Apr 28, 2025

gh-132942: Fix races in type lookup cache #133032

gh-132942: Fix races in type lookup cache #133032

Conversation

nascheme commented Apr 27, 2025 • edited Loading

nascheme commented Apr 28, 2025

colesbury left a comment

Choose a reason for hiding this comment

miss-islington-app bot commented Apr 28, 2025

miss-islington-app bot commented Apr 28, 2025

nascheme commented Apr 27, 2025 •

edited

Loading