-
Notifications
You must be signed in to change notification settings - Fork 126
Contention in CppCodeCache on execution with multiple processes #1347
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
That's fair, and timely because the question of this objects thread safety actually came up in this review: #1338 (comment) Your example is somewhat contrived, however, and it's not readily obvious from it exactly where the thread safety issue arises. Would you be open to trying to make a smaller repro? If not, I can take a look. |
@voznesenskym Good to know it was observed earlier. I won't say the case is contrived since running multiple instances at the same time is common in inference. I tried to simplify the repro a bit earlier but it cannot be reproduced sometimes due to the randomness. I believe making the compiled function bigger would make it more stable to repro (maybe add more aten calls to |
Sounds good, lmk if you need a hand. #1335 Is still under review, but you want to test with warmup + multiple inputs in a loop (closer to the inference case), this simple tool can help. My (naive) guess on
We should look into protecting the cache with Multiprocessing Lock. ( |
How about this? https://pypi.org/project/exclusiveprocess/ |
We should use https://pypi.org/project/filelock/, which is also used by Triton so not new dependency. |
Agreed, and I checked, it uses https://docs.python.org/3/library/fcntl.html under the hood, which is def what we want here if we go with something like this. |
@jgong5 let me know if you need help. |
@voznesenskym Thanks! @Valentine233 is working on it. |
Fixed in #1400 |
Repro
Run the python script with the following bash script
Error message dumped:
OSError: /tmp/torchinductor_jgong5/2u/c2uuffcbelf4a6jb5mb6dlxpunim7xtiox6artovnkozwiop5x3v.so: file too short
CppCodeCache
should be implemented with a multi-process safe way.The text was updated successfully, but these errors were encountered: