vLLM 0.13.0 causes error due to deprecated VLLM_ATTENTION_BACKEND environment variable

# Cmdline 


```bash
  VLLM_USE_V1=0 \
  FORCE_QWENVL_VIDEO_READER=torchcodec  \
  USE_AUDIO_IN_VIDEO=True \
  ENABLE_AUDIO_OUTPUT=False \
  FPS_MAX_FRAMES=4 \
  swift infer \
      --model "Qwen3-Omni-30B-A3B-Instruct" 
      --infer_backend vllm \
      --val_dataset "$WORK_DIR/query.jsonl" \
      --result_path "$WORK_DIR/captions_Qwen3-Omni-30B-A3B-Instruct.jsonl"

```

# Error
```
[rank0]:              ^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/opt/conda/lib/python3.11/site-packages/vllm/vllm_flash_attn/flash_attn_interface.py", line 253, in flash_attn_varlen_func
[rank0]:     out, softmax_lse = torch.ops._vllm_fa2_C.varlen_fwd(
[rank0]:                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/opt/conda/lib/python3.11/site-packages/torch/_ops.py", line 1255, in __call__
[rank0]:     return self._op(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: torch.AcceleratorError: CUDA error: the provided PTX was compiled with an unsupported toolchain.
[rank0]: Search for `cudaErrorUnsupportedPtxVersion' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.
[rank0]: CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
[rank0]: For debugging consider passing CUDA_LAUNCH_BLOCKING=1
[rank0]: Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

[rank0]:[W1219 15:23:53.162823414 ProcessGroupNCCL.cpp:1524] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
E1219 15:23:55.158000 1229 site-packages/torch/distributed/elastic/multiprocessing/api.py:882] failed (exitcode: 1) local_rank: 0 (pid: 1295) of binary: /opt/conda/bin/python3.11
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/opt/conda/lib/python3.11/site-packages/torch/distributed/run.py", line 940, in <module>
    main()
  File "/opt/conda/lib/python3.11/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 357, in wrapper
    return f(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/torch/distributed/run.py", line 936, in main
    run(args)
  File "/opt/conda/lib/python3.11/site-packages/torch/distributed/run.py", line 927, in run
    elastic_launch(
  File "/opt/conda/lib/python3.11/site-packages/torch/distributed/launcher/api.py", line 156, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/torch/distributed/launcher/api.py", line 293, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: 
============================================================
/opt/conda/lib/python3.11/site-packages/swift/cli/infer.py FAILED
```

# Notes 
It relates to https://github.com/vllm-project/vllm/pull/26315 and 

https://github.com/vllm-project/vllm/releases/tag/v0.13.0
> Breaking Changes & Deprecations
> PassConfig flags renamed per RFC https://github.com/vllm-project/vllm/issues/27995 (https://github.com/vllm-project/vllm/pull/29646)
> Attention env vars → CLI args: VLLM_ATTENTION_BACKEND replaced with --attention-backend (https://github.com/vllm-project/vllm/pull/26315)

# System specs 
```
vllm 0.13.0
ms_swift 3.11.1

Python version: 3.11.14 | packaged by conda-forge | (main, Oct 13 2025, 14:09:32) [GCC 14.3.0]
PyTorch version: 2.9.0+cu128 
CUDA version: 12.8
CUDNN version: 91002 
Name of current CUDA device: NVIDIA H200
```
 


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vLLM 0.13.0 causes error due to deprecated VLLM_ATTENTION_BACKEND environment variable #7132

Cmdline

Error

Notes

System specs

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

vLLM 0.13.0 causes error due to deprecated VLLM_ATTENTION_BACKEND environment variable #7132

Description

Cmdline

Error

Notes

System specs

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions