Skip to content

[MPS] MPSNDArray error: product of dimension sizes > 2**31 #84039

Closed
@junukwon7

Description

@junukwon7

🐛 Describe the bug

Full error message (no traceback):

AppleInternal/Library/BuildRoots/20d6c351-ee94-11ec-bcaf-7247572f23b4/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShaders/MPSCore/Types/MPSNDArray.mm:705: failed assertion '[MPSNDArray initWithDevice:descriptor:] Error: product of dimension sizes > 2**31 '

How to reproduce

  1. Install stable-diffusion using instructions for macOS
  2. run python scripts/txt2img.py --prompt "a horse" --plms --n_samples 1 --n_rows 1 --n_iter 1: runs well.
  3. But if you add --W 1024 --H 1024 flag, which means width and height respectively, then it'll return the error.
  • default width and height is 512, so no flag means 512x512

I'm finding a way to reproduce it without installing the whole procedure, so I'll update the procedure soon.

Edit: repro by @Birch-san

from torch import einsum, ones
import argparse

parser = argparse.ArgumentParser(description='mpsndarray test')
parser.add_argument('--n_samples', type=int, default=2)
args = parser.parse_args()
n_samples = args.n_samples

einsum('b i d, b j d -> b i j', ones(16 * n_samples, 4096, 40, device='mps'), ones(16 * n_samples, 4096, 40, device='mps')).shape

print(n_samples, 'passed')

It fails when n_samples is 2 or over 7. Which looks pretty weird.

About vram?

As you would all expect, the error seems to be something about VRAM. However, there remains question.

  1. The error seems to be the size exceeding INT_MAX(2**31)
    The error doesn't occur at --W 512 --H 512 or lower resolution.
  2. The error is a software issue
    Unlike errors like CUDA out of memory, this error isn't about the real memory limit.
    If the error was due to lack of VRAM, the code above (--W 1024 --H 1024) should run on M1 Max 64GB since --W 512 --H 512 runs well on my M1 8G macbook. Also, the limit 2**31 is a fixed number, which would not change from the current memory usage.

So, my expectation is that something is being computed in 32-bit, which shouldn't be.

This might not be torch's problem - maybe(surely) metal.

However, all helps will be accepted gracefully.

Thanks.

Versions

PyTorch version: 1.13.0.dev20220824
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A

OS: macOS 12.5.1 (arm64)
GCC version: Could not collect
Clang version: 13.1.6 (clang-1316.0.21.2.5)
CMake version: version 3.24.1
Libc version: N/A

Python version: 3.8.13 | packaged by conda-forge | (default, Mar 25 2022, 06:04:14) [Clang 12.0.1 ] (64-bit runtime)
Python platform: macOS-12.5.1-arm64-arm-64bit
Is CUDA available: False
CUDA runtime version: No CUDA
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

Versions of relevant libraries:
[pip3] numpy==1.23.2
[pip3] pytorch-lightning==1.7.2
[pip3] torch==1.13.0.dev20220824
[pip3] torch-fidelity==0.3.0
[pip3] torchaudio==0.13.0.dev20220824
[pip3] torchmetrics==0.9.3
[pip3] torchvision==0.14.0.dev20220824
[conda] numpy 1.23.2 py38h579d673_0 conda-forge
[conda] pytorch 1.13.0.dev20220824 py3.8_0 pytorch-nightly
[conda] pytorch-lightning 1.7.2 pypi_0 pypi
[conda] torch-fidelity 0.3.0 pypi_0 pypi
[conda] torchaudio 0.13.0.dev20220824 py38_cpu pytorch-nightly
[conda] torchmetrics 0.9.3 pypi_0 pypi
[conda] torchvision 0.14.0.dev20220824 py38_cpu pytorch-nightly

cc @kulinseth @albanD

Metadata

Metadata

Assignees

Labels

module: mpsRelated to Apple Metal Performance Shaders frameworktriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate module

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions