Contention in CppCodeCache on execution with multiple processes #1347

jgong5 · 2022-09-26T07:29:18Z

Repro

import torch
import torchdynamo

@torchdynamo.optimize()
def f(a):
    return a.softmax(dim=1) + a.sum()

a = torch.rand([100,100])
f(a)

Run the python script with the following bash script

python test_multi.py &
python test_multi.py &
python test_multi.py &
python test_multi.py &
python test_multi.py &
python test_multi.py &
python test_multi.py &
python test_multi.py &
python test_multi.py &
python test_multi.py &
python test_multi.py &
python test_multi.py &
python test_multi.py &
python test_multi.py &
python test_multi.py &
python test_multi.py &
wait

Error message dumped:
OSError: /tmp/torchinductor_jgong5/2u/c2uuffcbelf4a6jb5mb6dlxpunim7xtiox6artovnkozwiop5x3v.so: file too short

CppCodeCache should be implemented with a multi-process safe way.

The text was updated successfully, but these errors were encountered:

voznesenskym · 2022-09-26T07:34:05Z

That's fair, and timely because the question of this objects thread safety actually came up in this review: #1338 (comment)

Your example is somewhat contrived, however, and it's not readily obvious from it exactly where the thread safety issue arises. Would you be open to trying to make a smaller repro? If not, I can take a look.

jgong5 · 2022-09-26T07:39:00Z

@voznesenskym Good to know it was observed earlier. I won't say the case is contrived since running multiple instances at the same time is common in inference. I tried to simplify the repro a bit earlier but it cannot be reproduced sometimes due to the randomness. I believe making the compiled function bigger would make it more stable to repro (maybe add more aten calls to f). We are working on a fix.

voznesenskym · 2022-09-26T08:08:48Z

@voznesenskym Good to know it was observed earlier. I won't say the case is contrived since running multiple instances at the same time is common in inference. I tried to simplify the repro a bit earlier but it cannot be reproduced sometimes due to the randomness. I believe making the compiled function bigger would make it more stable to repro (maybe add more aten calls to f). We are working on a fix.

Sounds good, lmk if you need a hand. #1335 Is still under review, but you want to test with warmup + multiple inputs in a loop (closer to the inference case), this simple tool can help.

My (naive) guess on OSError: /tmp/torchinductor_jgong5/2u/c2uuffcbelf4a6jb5mb6dlxpunim7xtiox6artovnkozwiop5x3v.so: file too short is that we may get into a state where a finished entry gets overwritten:

Process A start - t=0
Process B start - t=0
A checks if key not in cls.cache: - t=1 (True, no key in cache)
B checks if key not in cls.cache: - t=1 (True, no key in cache)
A compiles, starts and finishes writes to cls.cache[key] - t=2
B compiles, starts writes to cls.cache[key] - 2 < t < 3
A accesses cache, gets partially written file from cache, expecting its own file, instead gets the partially written file from 6

We should look into protecting the cache with Multiprocessing Lock. (from multiprocessing import Lock). Alternatively, if you don't like that, I suppose we can use an OS level file lock (https://docs.python.org/3/library/fcntl.html), since the path is stable. Thoughts?

jgong5 · 2022-09-26T08:46:05Z

We should look into protecting the cache with Multiprocessing Lock. (from multiprocessing import Lock). Alternatively, if you don't like that, I suppose we can use an OS level file lock (https://docs.python.org/3/library/fcntl.html), since the path is stable. Thoughts?

How about this? https://pypi.org/project/exclusiveprocess/

jansel · 2022-09-26T15:57:40Z

We should use https://pypi.org/project/filelock/, which is also used by Triton so not new dependency.

voznesenskym · 2022-09-26T17:03:27Z

We should use https://pypi.org/project/filelock/, which is also used by Triton so not new dependency.

Agreed, and I checked, it uses https://docs.python.org/3/library/fcntl.html under the hood, which is def what we want here if we go with something like this.

voznesenskym · 2022-09-26T17:03:43Z

@jgong5 let me know if you need help.

jgong5 · 2022-09-27T05:36:06Z

@jgong5 let me know if you need help.

@voznesenskym Thanks! @Valentine233 is working on it.

jgong5 · 2022-10-28T08:58:38Z

Fixed in #1400

voznesenskym mentioned this issue Sep 28, 2022

Fix cache contention for multiple processes #1400

Merged

jgong5 closed this as completed Oct 28, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Contention in CppCodeCache on execution with multiple processes #1347

Contention in CppCodeCache on execution with multiple processes #1347

jgong5 commented Sep 26, 2022

voznesenskym commented Sep 26, 2022

jgong5 commented Sep 26, 2022

voznesenskym commented Sep 26, 2022

jgong5 commented Sep 26, 2022

jansel commented Sep 26, 2022

voznesenskym commented Sep 26, 2022

voznesenskym commented Sep 26, 2022

jgong5 commented Sep 27, 2022

jgong5 commented Oct 28, 2022

Contention in CppCodeCache on execution with multiple processes #1347

Contention in CppCodeCache on execution with multiple processes #1347

Comments

jgong5 commented Sep 26, 2022

voznesenskym commented Sep 26, 2022

jgong5 commented Sep 26, 2022

voznesenskym commented Sep 26, 2022

jgong5 commented Sep 26, 2022

jansel commented Sep 26, 2022

voznesenskym commented Sep 26, 2022

voznesenskym commented Sep 26, 2022

jgong5 commented Sep 27, 2022

jgong5 commented Oct 28, 2022