-
Notifications
You must be signed in to change notification settings - Fork 5k
Fix for solving lock contention issue in GC statics scanning. #32795
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This looks good to me as an incremental improvement. It still unfortunate that this is done during STW pause. I believe we should be taking every opportunity to move work from STW pauses to background. |
Well, what we do during STW pause is essentially snapshotting the object pointers stored in the statics, we don't explore the object graph yet. I think it's possible in principle to do the scanning while the program is running, this will likely need some care so we don't run into tricky race conditions. |
great! this is a much better solution. re switching orders in |
approved but I know Peter is working on a test to verify so there may (but unlikely) be some changes coming from that. |
I like this fix more than the previous one too. |
Re: doing the scan outside of STW. One way to do that could be storing the statics in managed arrays, - similar to what we do with collectible assemblies. Then scanning happens naturally. |
The statics are stored in managed arrays. This extra code frontloads scanning of these managed arrays with statics. It is not required for correctness - if it was deleted, everything would still correctly, just the timing would be different. |
The code mentions that this is not needed for collectible assemblies because they are stored in managed arrays, so I assumed that regular statics are not. |
In such case, perhaps enumerating just the arrays would be sufficient? - I.E. no need to dig into elements? |
You can make a similar argument for why this front-loading is beneficial at all. It is why it would be useful to have a micro-benchmark to demonstrate it. Then we can at least somewhat reason about the impact of different tweaks. |
Right. Enumerating just the containing arrays is basically the same as enumerating handles and we do that a few lines later. My concern was that if multiple threads will start fight for every static variable, there could be some issues with false sharing. If you have N threads and K statics, it seems it would result in N*K accesses to the same data and in a worst case nearly all will be cache misses. |
I wrote the micro-benchmark that Jan and Maoni suggested. Result is that there is significant benefit to the front-loading, but the handle table scanning is still the long pole in many cases. This is probably because if the thread doing the handle table scanning gets first to the array containing the statics, it will greedily mark the contained objects before actually tracing the contained object graphs. The other threads will then conclude there's nothing to do for them. Switching the order of handle table scanning and draining the mark list fixed this, but that is for a later checkin. |
Is this good to merge? |
I think so unless @PeterSolMS has objections? |
I believe this is a better solution for GC statics scanning.
Instead of walking through the module list and scanning the statics for each module, walk the buckets in the large heap handle table where the statics are stored. This list is only changed in cooperative mode, so it should be safe to walk during GC suspensions.