Feat/Bug: Vulkan backend missing ggml_ssm_conv / ggml_ssm_scan kernels — Qwen3.5-35B-A3B (qwen3_5moe) CPU-only on Vulkan

### Prerequisites

- [x] I am running the latest code. Mention the version if possible as well.
- [x] I carefully followed the [README.md](https://github.com/ggml-org/llama.cpp/blob/master/README.md).
- [x] I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
- [x] I reviewed the [Discussions](https://github.com/ggml-org/llama.cpp/discussions), and have a new and useful enhancement to share.

### Feature Description

## Summary

`Qwen3.5-35B-A3B` (`qwen3_5moe` architecture) uses DeltaNet-style SSM layers
(`ggml_ssm_conv`, `ggml_ssm_scan`) which have no Vulkan compute shader
implementation. This causes the model to silently fall back to CPU for those
ops, producing corrupt hidden state across the GPU↔CPU boundary and making the
model unusable with `-ngl 99` on Vulkan backends.

## Hardware

- **GPU**: Intel(R) Arc(TM) 140V GPU (16GB UMA, Arrow Lake)
- **Backend**: Vulkan (KHR_coopmat available, disabled via GGML_VK_DISABLE_COOPMAT=1)
- **OS**: Windows 11, llama.cpp build 27 (d903f30), GNU 15.2.0

## Steps to Reproduce

```bash
./llama-server \
  -m Qwen3.5-35B-A3B-Q4_K_M.gguf \
  -ngl 99 \
  --port 8080
```

## Observed Behavior

- Model loads and offloads all layers to GPU successfully
- First inference produces corrupted / garbage output
- Subsequent calls result in `vk::DeviceLostError` crash:

```
terminate called after throwing an instance of 'vk::DeviceLostError'
  what():  vk::Device::getFenceStatus: ErrorDeviceLost
```

## Root Cause

`src/models/qwen35.cpp` calls `build_delta_net_chunking()`,
`build_delta_net_recurrent()`, and `build_delta_net_autoregressive()`,
which internally use:

- `ggml_ssm_conv`
- `ggml_ssm_scan`

Neither of these ops has a Vulkan kernel in `ggml-vulkan.cpp`.
The CUDA backend has full implementations. Vulkan silently routes
these to CPU, causing state corruption on the GPU↔CPU boundary
during autoregressive generation.

## Contrast: Qwen3-30B-A3B Works Fine

`qwen3moe` (Qwen3-30B-A3B) is pure attention MoE — no SSM layers —
and works correctly on Vulkan after disabling coopmat:

```bat
set GGML_VK_DISABLE_COOPMAT=1
set GGML_VK_DISABLE_COOPMAT2=1
```

→ Stable at **~27 t/s** generation on Intel Arc 140V.

`qwen3_5moe` (Qwen3.5-35B-A3B) has interleaved DeltaNet SSM layers
and **cannot be fixed with env vars** — it needs proper Vulkan kernels.

## Expected Behavior

`ggml_ssm_conv` and `ggml_ssm_scan` should have Vulkan compute shader
implementations analogous to the existing CUDA kernels, allowing
`qwen3_5moe` to run fully GPU-accelerated on Vulkan backends.

## Related

- #19903 — Qwen3.5-35B-A3B unknown model
- #19887 — Qwen3-Coder-30B-A3B Low PP on Intel Arc Vulkan
- `src/models/qwen35.cpp` — DeltaNet layer implementation
- `ggml-vulkan.cpp` — missing `GGML_OP_SSM_CONV` / `GGML_OP_SSM_SCAN` dispatch

### Motivation

Qwen3.5-35B-A3B (architecture: qwen3_5moe) is a highly capable hybrid MoE model
that uses DeltaNet-style SSM layers interleaved with standard attention layers.
These SSM ops (ggml_ssm_conv, ggml_ssm_scan) have no Vulkan compute shader
implementation, making the model completely unusable on Vulkan backends despite
loading successfully and offloading all layers to GPU.

This affects all users running llama.cpp on Vulkan (Intel Arc, AMD, mobile GPUs)
who cannot use CUDA. The model crashes with vk::DeviceLostError on the first
inference call due to corrupt hidden state from CPU↔GPU boundary crossing during
SSM layer computation.

By contrast, Qwen3-30B-A3B (pure attention MoE, qwen3moe architecture) works
correctly on Vulkan after disabling coopmat. The only blocker for qwen3_5moe is
the missing Vulkan kernels for ggml_ssm_conv and ggml_ssm_scan.

Qwen3.5-35B-A3B is one of the best open-weight models available at its size and
is widely used for agentic/coding tasks. Full Vulkan support would unlock it for
the large community of non-CUDA GPU users.


### Possible Implementation

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat/Bug: Vulkan backend missing ggml_ssm_conv / ggml_ssm_scan kernels — Qwen3.5-35B-A3B (qwen3_5moe) CPU-only on Vulkan #19957

Prerequisites

Feature Description

Summary

Hardware

Steps to Reproduce

Observed Behavior

Root Cause

Contrast: Qwen3-30B-A3B Works Fine

Expected Behavior

Related

Motivation

Possible Implementation

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Feat/Bug: Vulkan backend missing ggml_ssm_conv / ggml_ssm_scan kernels — Qwen3.5-35B-A3B (qwen3_5moe) CPU-only on Vulkan #19957

Description

Prerequisites

Feature Description

Summary

Hardware

Steps to Reproduce

Observed Behavior

Root Cause

Contrast: Qwen3-30B-A3B Works Fine

Expected Behavior

Related

Motivation

Possible Implementation

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions