Skip to content

Contention in CppCodeCache on execution with multiple processes #1347

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jgong5 opened this issue Sep 26, 2022 · 9 comments
Closed

Contention in CppCodeCache on execution with multiple processes #1347

jgong5 opened this issue Sep 26, 2022 · 9 comments

Comments

@jgong5
Copy link

jgong5 commented Sep 26, 2022

Repro

import torch
import torchdynamo

@torchdynamo.optimize()
def f(a):
    return a.softmax(dim=1) + a.sum()

a = torch.rand([100,100])
f(a)

Run the python script with the following bash script

python test_multi.py &
python test_multi.py &
python test_multi.py &
python test_multi.py &
python test_multi.py &
python test_multi.py &
python test_multi.py &
python test_multi.py &
python test_multi.py &
python test_multi.py &
python test_multi.py &
python test_multi.py &
python test_multi.py &
python test_multi.py &
python test_multi.py &
python test_multi.py &
wait

Error message dumped:
OSError: /tmp/torchinductor_jgong5/2u/c2uuffcbelf4a6jb5mb6dlxpunim7xtiox6artovnkozwiop5x3v.so: file too short

CppCodeCache should be implemented with a multi-process safe way.

@voznesenskym
Copy link
Contributor

That's fair, and timely because the question of this objects thread safety actually came up in this review: #1338 (comment)

Your example is somewhat contrived, however, and it's not readily obvious from it exactly where the thread safety issue arises. Would you be open to trying to make a smaller repro? If not, I can take a look.

@jgong5
Copy link
Author

jgong5 commented Sep 26, 2022

@voznesenskym Good to know it was observed earlier. I won't say the case is contrived since running multiple instances at the same time is common in inference. I tried to simplify the repro a bit earlier but it cannot be reproduced sometimes due to the randomness. I believe making the compiled function bigger would make it more stable to repro (maybe add more aten calls to f). We are working on a fix.

@voznesenskym
Copy link
Contributor

@voznesenskym Good to know it was observed earlier. I won't say the case is contrived since running multiple instances at the same time is common in inference. I tried to simplify the repro a bit earlier but it cannot be reproduced sometimes due to the randomness. I believe making the compiled function bigger would make it more stable to repro (maybe add more aten calls to f). We are working on a fix.

Sounds good, lmk if you need a hand. #1335 Is still under review, but you want to test with warmup + multiple inputs in a loop (closer to the inference case), this simple tool can help.

My (naive) guess on OSError: /tmp/torchinductor_jgong5/2u/c2uuffcbelf4a6jb5mb6dlxpunim7xtiox6artovnkozwiop5x3v.so: file too short is that we may get into a state where a finished entry gets overwritten:

  1. Process A start - t=0
  2. Process B start - t=0
  3. A checks if key not in cls.cache: - t=1 (True, no key in cache)
  4. B checks if key not in cls.cache: - t=1 (True, no key in cache)
  5. A compiles, starts and finishes writes to cls.cache[key] - t=2
  6. B compiles, starts writes to cls.cache[key] - 2 < t < 3
  7. A accesses cache, gets partially written file from cache, expecting its own file, instead gets the partially written file from 6

We should look into protecting the cache with Multiprocessing Lock. (from multiprocessing import Lock). Alternatively, if you don't like that, I suppose we can use an OS level file lock (https://docs.python.org/3/library/fcntl.html), since the path is stable. Thoughts?

@jgong5
Copy link
Author

jgong5 commented Sep 26, 2022

We should look into protecting the cache with Multiprocessing Lock. (from multiprocessing import Lock). Alternatively, if you don't like that, I suppose we can use an OS level file lock (https://docs.python.org/3/library/fcntl.html), since the path is stable. Thoughts?

How about this? https://pypi.org/project/exclusiveprocess/

@jansel
Copy link
Contributor

jansel commented Sep 26, 2022

We should use https://pypi.org/project/filelock/, which is also used by Triton so not new dependency.

@voznesenskym
Copy link
Contributor

We should use https://pypi.org/project/filelock/, which is also used by Triton so not new dependency.

Agreed, and I checked, it uses https://docs.python.org/3/library/fcntl.html under the hood, which is def what we want here if we go with something like this.

@voznesenskym
Copy link
Contributor

@jgong5 let me know if you need help.

@jgong5
Copy link
Author

jgong5 commented Sep 27, 2022

@jgong5 let me know if you need help.

@voznesenskym Thanks! @Valentine233 is working on it.

@jgong5
Copy link
Author

jgong5 commented Oct 28, 2022

Fixed in #1400

@jgong5 jgong5 closed this as completed Oct 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants