Skip to content

vLLM 0.13.0 causes error due to deprecated VLLM_ATTENTION_BACKEND environment variable #7132

@3manifold

Description

@3manifold

Cmdline

  VLLM_USE_V1=0 \
  FORCE_QWENVL_VIDEO_READER=torchcodec  \
  USE_AUDIO_IN_VIDEO=True \
  ENABLE_AUDIO_OUTPUT=False \
  FPS_MAX_FRAMES=4 \
  swift infer \
      --model "Qwen3-Omni-30B-A3B-Instruct" 
      --infer_backend vllm \
      --val_dataset "$WORK_DIR/query.jsonl" \
      --result_path "$WORK_DIR/captions_Qwen3-Omni-30B-A3B-Instruct.jsonl"

Error

[rank0]:              ^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/opt/conda/lib/python3.11/site-packages/vllm/vllm_flash_attn/flash_attn_interface.py", line 253, in flash_attn_varlen_func
[rank0]:     out, softmax_lse = torch.ops._vllm_fa2_C.varlen_fwd(
[rank0]:                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/opt/conda/lib/python3.11/site-packages/torch/_ops.py", line 1255, in __call__
[rank0]:     return self._op(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: torch.AcceleratorError: CUDA error: the provided PTX was compiled with an unsupported toolchain.
[rank0]: Search for `cudaErrorUnsupportedPtxVersion' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.
[rank0]: CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
[rank0]: For debugging consider passing CUDA_LAUNCH_BLOCKING=1
[rank0]: Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

[rank0]:[W1219 15:23:53.162823414 ProcessGroupNCCL.cpp:1524] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
E1219 15:23:55.158000 1229 site-packages/torch/distributed/elastic/multiprocessing/api.py:882] failed (exitcode: 1) local_rank: 0 (pid: 1295) of binary: /opt/conda/bin/python3.11
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/opt/conda/lib/python3.11/site-packages/torch/distributed/run.py", line 940, in <module>
    main()
  File "/opt/conda/lib/python3.11/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 357, in wrapper
    return f(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/torch/distributed/run.py", line 936, in main
    run(args)
  File "/opt/conda/lib/python3.11/site-packages/torch/distributed/run.py", line 927, in run
    elastic_launch(
  File "/opt/conda/lib/python3.11/site-packages/torch/distributed/launcher/api.py", line 156, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/torch/distributed/launcher/api.py", line 293, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: 
============================================================
/opt/conda/lib/python3.11/site-packages/swift/cli/infer.py FAILED

Notes

It relates to vllm-project/vllm#26315 and

https://github.com/vllm-project/vllm/releases/tag/v0.13.0

Breaking Changes & Deprecations
PassConfig flags renamed per RFC vllm-project/vllm#27995 (vllm-project/vllm#29646)
Attention env vars → CLI args: VLLM_ATTENTION_BACKEND replaced with --attention-backend (vllm-project/vllm#26315)

System specs

vllm 0.13.0
ms_swift 3.11.1

Python version: 3.11.14 | packaged by conda-forge | (main, Oct 13 2025, 14:09:32) [GCC 14.3.0]
PyTorch version: 2.9.0+cu128 
CUDA version: 12.8
CUDNN version: 91002 
Name of current CUDA device: NVIDIA H200

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions