Description
🐛 Describe the bug
Full error message (no traceback):
AppleInternal/Library/BuildRoots/20d6c351-ee94-11ec-bcaf-7247572f23b4/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShaders/MPSCore/Types/MPSNDArray.mm:705: failed assertion '[MPSNDArray initWithDevice:descriptor:] Error: product of dimension sizes > 2**31 '
How to reproduce
- Install stable-diffusion using instructions for macOS
- run
python scripts/txt2img.py --prompt "a horse" --plms --n_samples 1 --n_rows 1 --n_iter 1
: runs well. - But if you add
--W 1024 --H 1024
flag, which means width and height respectively, then it'll return the error.
- default width and height is 512, so no flag means 512x512
I'm finding a way to reproduce it without installing the whole procedure, so I'll update the procedure soon.
Edit: repro by @Birch-san
from torch import einsum, ones
import argparse
parser = argparse.ArgumentParser(description='mpsndarray test')
parser.add_argument('--n_samples', type=int, default=2)
args = parser.parse_args()
n_samples = args.n_samples
einsum('b i d, b j d -> b i j', ones(16 * n_samples, 4096, 40, device='mps'), ones(16 * n_samples, 4096, 40, device='mps')).shape
print(n_samples, 'passed')
It fails when n_samples is 2 or over 7. Which looks pretty weird.
About vram?
As you would all expect, the error seems to be something about VRAM. However, there remains question.
- The error seems to be the size exceeding
INT_MAX(2**31)
The error doesn't occur at--W 512 --H 512
or lower resolution. - The error is a software issue
Unlike errors likeCUDA out of memory
, this error isn't about the real memory limit.
If the error was due to lack of VRAM, the code above (--W 1024 --H 1024
) should run on M1 Max 64GB since--W 512 --H 512
runs well on my M1 8G macbook. Also, the limit 2**31 is a fixed number, which would not change from the current memory usage.
So, my expectation is that something is being computed in 32-bit, which shouldn't be.
This might not be torch's problem - maybe(surely) metal.
However, all helps will be accepted gracefully.
Thanks.
Versions
PyTorch version: 1.13.0.dev20220824
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A
OS: macOS 12.5.1 (arm64)
GCC version: Could not collect
Clang version: 13.1.6 (clang-1316.0.21.2.5)
CMake version: version 3.24.1
Libc version: N/A
Python version: 3.8.13 | packaged by conda-forge | (default, Mar 25 2022, 06:04:14) [Clang 12.0.1 ] (64-bit runtime)
Python platform: macOS-12.5.1-arm64-arm-64bit
Is CUDA available: False
CUDA runtime version: No CUDA
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
Versions of relevant libraries:
[pip3] numpy==1.23.2
[pip3] pytorch-lightning==1.7.2
[pip3] torch==1.13.0.dev20220824
[pip3] torch-fidelity==0.3.0
[pip3] torchaudio==0.13.0.dev20220824
[pip3] torchmetrics==0.9.3
[pip3] torchvision==0.14.0.dev20220824
[conda] numpy 1.23.2 py38h579d673_0 conda-forge
[conda] pytorch 1.13.0.dev20220824 py3.8_0 pytorch-nightly
[conda] pytorch-lightning 1.7.2 pypi_0 pypi
[conda] torch-fidelity 0.3.0 pypi_0 pypi
[conda] torchaudio 0.13.0.dev20220824 py38_cpu pytorch-nightly
[conda] torchmetrics 0.9.3 pypi_0 pypi
[conda] torchvision 0.14.0.dev20220824 py38_cpu pytorch-nightly