CANN: Multi-stream support by hipudding · Pull Request #19284 · ggml-org/llama.cpp

hipudding · 2026-02-03T06:10:24Z

Make sure to read the contributing guidelines before submitting a PR

Implement ggml_backend_cann_graph_optimize function for CANN backend, ported from Vulkan backend (PR ggml-org#15489 and ggml-org#15850). Key changes: - Add graph optimization to reorder nodes based on dependency analysis - Group non-dependent nodes together for potential parallel execution - Preserve fusion patterns (RMS_NORM+MUL, MUL_MAT+ADD, ADD+RMS_NORM) - Add GGML_CANN_DISABLE_GRAPH_OPTIMIZE env var to disable optimization This is the first step toward multi-stream parallel execution on Ascend NPU.

- Replace tensor-pointer-based dependency tracking with memory-address-based tracking - Use std::map<void*, int> to track pending writes per stream - Implement smart stream selection: - No dependencies: round-robin distribution - Single dependency: execute on same stream (avoid sync overhead) - Multiple dependencies: sync all streams - Add WAW (Write-After-Write) hazard detection - Fix output corruption issue when using multi-stream execution Enable with: GGML_CANN_MULTI_STREAM=1

When GGML_CANN_MULTI_STREAM=1 is set, ACL graph capture/execution must be disabled since they are incompatible. The previous code had a bug where the prefill_use_graph check would overwrite use_cann_graph after it was set to false for multi-stream mode. Fix by wrapping the prefill_use_graph check inside if (use_cann_graph) to ensure it only runs when ACL graph is not already disabled.

- Use parse_bool() for GGML_CANN_MULTI_STREAM environment variable parsing, consistent with other env var handling - Only synchronize dependent streams instead of all streams when a node has multiple dependencies, reducing sync overhead - Performance improvement: ~9% faster prompt processing on 0.5B model (1838 t/s vs 1688 t/s with ACL graph disabled)

- Add operator_fusion_enabled flag to ggml_backend_cann_context - Implement conflict detection in constructor: * ACL graph mode disables multi-stream (higher performance) * Multi-stream mode disables operator fusion (low benefit) - Remove multi-stream fusion code (fusion disabled in multi-stream) - Keep fusion functionality in single-stream mode - Remove redundant multi_stream_enabled check in graph_compute - Fix unused variable warning (sync_all_to_stream)

Remove all operator fusion pattern detection logic from graph optimization to focus on reducing dependencies between operators in multi-stream scenarios. Key changes: - Remove fusion pattern matching for RMS_NORM+MUL, MUL_MAT+ADD, etc. - Remove match_pattern and keep_pattern helper functions - Simplify to two-pass approach: real nodes first, then view nodes - Focus on dependency analysis for better parallelism - Reduce code complexity by ~47% (235 lines -> 125 lines) This approach is inspired by the Vulkan backend implementation and prioritizes multi-stream parallelism over operator fusion, as fusion provides minimal performance benefits in the CANN backend. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

hipudding self-assigned this Feb 3, 2026

hipudding added the Ascend NPU issues specific to Ascend NPUs label Feb 3, 2026

github-actions Bot added the ggml changes relating to the ggml tensor library for machine learning label Feb 3, 2026

hipudding closed this Feb 3, 2026

hipudding reopened this Feb 3, 2026

loci-dev mentioned this pull request Feb 3, 2026

UPSTREAM PR #19284: CANN: Multi-stream support auroralabs-loci/llama.cpp#1150

Open

hipudding added 5 commits February 4, 2026 08:37

hipudding force-pushed the mul_stream branch from d7ac8bb to dd9e377 Compare February 6, 2026 02:30

hipudding and others added 2 commits February 6, 2026 02:51

optimize env conflicts

1f9374e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CANN: Multi-stream support#19284

CANN: Multi-stream support#19284
hipudding wants to merge 7 commits into
ggml-org:masterfrom
hipudding:mul_stream

hipudding commented Feb 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

hipudding commented Feb 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant