-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Open
Description
Cmdline
VLLM_USE_V1=0 \
FORCE_QWENVL_VIDEO_READER=torchcodec \
USE_AUDIO_IN_VIDEO=True \
ENABLE_AUDIO_OUTPUT=False \
FPS_MAX_FRAMES=4 \
swift infer \
--model "Qwen3-Omni-30B-A3B-Instruct"
--infer_backend vllm \
--val_dataset "$WORK_DIR/query.jsonl" \
--result_path "$WORK_DIR/captions_Qwen3-Omni-30B-A3B-Instruct.jsonl"
Error
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/opt/conda/lib/python3.11/site-packages/vllm/vllm_flash_attn/flash_attn_interface.py", line 253, in flash_attn_varlen_func
[rank0]: out, softmax_lse = torch.ops._vllm_fa2_C.varlen_fwd(
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/opt/conda/lib/python3.11/site-packages/torch/_ops.py", line 1255, in __call__
[rank0]: return self._op(*args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: torch.AcceleratorError: CUDA error: the provided PTX was compiled with an unsupported toolchain.
[rank0]: Search for `cudaErrorUnsupportedPtxVersion' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.
[rank0]: CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
[rank0]: For debugging consider passing CUDA_LAUNCH_BLOCKING=1
[rank0]: Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
[rank0]:[W1219 15:23:53.162823414 ProcessGroupNCCL.cpp:1524] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
E1219 15:23:55.158000 1229 site-packages/torch/distributed/elastic/multiprocessing/api.py:882] failed (exitcode: 1) local_rank: 0 (pid: 1295) of binary: /opt/conda/bin/python3.11
Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "/opt/conda/lib/python3.11/site-packages/torch/distributed/run.py", line 940, in <module>
main()
File "/opt/conda/lib/python3.11/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 357, in wrapper
return f(*args, **kwargs)
^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/torch/distributed/run.py", line 936, in main
run(args)
File "/opt/conda/lib/python3.11/site-packages/torch/distributed/run.py", line 927, in run
elastic_launch(
File "/opt/conda/lib/python3.11/site-packages/torch/distributed/launcher/api.py", line 156, in __call__
return launch_agent(self._config, self._entrypoint, list(args))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/torch/distributed/launcher/api.py", line 293, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
============================================================
/opt/conda/lib/python3.11/site-packages/swift/cli/infer.py FAILED
Notes
It relates to vllm-project/vllm#26315 and
https://github.com/vllm-project/vllm/releases/tag/v0.13.0
Breaking Changes & Deprecations
PassConfig flags renamed per RFC vllm-project/vllm#27995 (vllm-project/vllm#29646)
Attention env vars → CLI args: VLLM_ATTENTION_BACKEND replaced with --attention-backend (vllm-project/vllm#26315)
System specs
vllm 0.13.0
ms_swift 3.11.1
Python version: 3.11.14 | packaged by conda-forge | (main, Oct 13 2025, 14:09:32) [GCC 14.3.0]
PyTorch version: 2.9.0+cu128
CUDA version: 12.8
CUDNN version: 91002
Name of current CUDA device: NVIDIA H200
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels