UPSTREAM PR #17543: CANN: add support for partial RoPE and Vision mode #344

loci-dev · 2025-11-27T09:37:19Z

Add support for two important RoPE variants: partial rotation (rope_dims < ne0) and Vision mode rotation.

Support for partial RoPE (rope_dims < ne0):
- Split tensor into head (first rope_dims dimensions) and tail portions
- Apply rotation only to head portion using RotaryPositionEmbedding operator
- Copy unrotated tail portion directly from source to destination
- Handle both contiguous and non-contiguous tensor layouts
Support for Vision mode (GGML_ROPE_TYPE_VISION):
- Set rope_dims = ne0 for Vision mode to rotate entire tensor
- Vision mode pairs dimension i with dimension i+n_dims (where n_dims = ne0/2)
- No tail handling needed since entire tensor is rotated

Implementation details:

Use has_tail flag to determine execution path: head/tail splitting when rope_dims < ne0, or full tensor rotation when rope_dims == ne0
Support both F32 and F16 data types with intermediate F32 conversion
Copy non-contiguous tensors to contiguous buffers before calling RotaryPositionEmbedding operator for compatibility
Improve cache invalidation logic to include rope_dims and indep_sects parameters

These enhancements enable CANN backend to handle various RoPE configurations used in modern vision-language models and models with partial rotation.

Make sure to read the contributing guidelines before submitting a PR

Add support for two important RoPE variants: partial rotation (rope_dims < ne0) and Vision mode rotation. 1. Support for partial RoPE (rope_dims < ne0): - Split tensor into head (first rope_dims dimensions) and tail portions - Apply rotation only to head portion using RotaryPositionEmbedding operator - Copy unrotated tail portion directly from source to destination - Handle both contiguous and non-contiguous tensor layouts 2. Support for Vision mode (GGML_ROPE_TYPE_VISION): - Set rope_dims = ne0 for Vision mode to rotate entire tensor - Vision mode pairs dimension i with dimension i+n_dims (where n_dims = ne0/2) - No tail handling needed since entire tensor is rotated Implementation details: - Use has_tail flag to determine execution path: head/tail splitting when rope_dims < ne0, or full tensor rotation when rope_dims == ne0 - Support both F32 and F16 data types with intermediate F32 conversion - Copy non-contiguous tensors to contiguous buffers before calling RotaryPositionEmbedding operator for compatibility - Improve cache invalidation logic to include rope_dims and indep_sects parameters These enhancements enable CANN backend to handle various RoPE configurations used in modern vision-language models and models with partial rotation.

loci-agentic-ai · 2025-11-27T10:13:57Z

Explore the complete analysis inside the Version Insights

Performance Analysis Summary - PR #344

Analysis: This PR implements partial RoPE and vision mode support for the CANN backend across 3 files with 222 additions and 70 deletions. The changes modify the ggml_cann_rope function and related cache initialization logic in aclnn_ops.cpp, extend the ggml_cann_rope_cache structure in common.h, and update backend support logic in ggml-cann.cpp.

Performance Impact: No measurable performance changes detected. Power consumption analysis shows less than 0.001% variation across all binaries, with maximum absolute delta of 0.66 nJ in libllama.so. No functions show measurable changes in response time or throughput time between versions.

Inference Impact: No impact on tokens per second. The core inference functions (llama_decode, llama_encode, llama_tokenize) show no response time or throughput changes. The modifications are isolated to CANN backend RoPE operations, which do not affect CPU-based tokenization or inference paths.

Code Changes: The implementation adds conditional logic for partial rotation (when rope_dims < ne0) by splitting tensors into head and tail portions. For F32 tensors, the head undergoes rotation via RotaryPositionEmbedding while the tail is copied directly. F16 tensors follow the same pattern with intermediate F32 conversion. Vision mode sets rope_dims = ne0 for full tensor rotation. The changes enable support for vision-language models without affecting existing full-rotation models, which bypass the new code path when has_tail == false.

loci-agentic-ai · 2025-11-29T04:23:12Z

Explore the complete analysis inside the Version Insights

Performance Review Summary: PR #344 - CANN Backend Partial RoPE Support

Overview

PR #344 implements partial Rotary Position Embedding and Vision mode support in the CANN backend (ggml-cann library). The changes modify aclnn_ops.cpp (153 additions, 61 deletions) and ggml-cann.cpp (6 additions, 8 deletions) to enable head/tail tensor splitting for models where rope_dims < ne0.

Key Findings

Performance-Critical Function Impact

The modified ggml_cann_rope() function in aclnn_ops.cpp introduces conditional execution paths:

Full RoPE path (rope_dims == ne00): Execution remains unchanged with no performance delta
Partial RoPE path (rope_dims < ne00): Adds head buffer allocation, head-only rotation, head copy-back operation, and tail copy operation

For partial RoPE cases with typical attention dimensions (rope_dims=64, ne00=128, ne01=32, ne02=2048), the additional operations introduce approximately 160000 ns overhead per call from memory copy operations alone.

The aclnn_rope_cache_init() function signature change adds rope_dims parameter, enabling correct cache sizing for partial rotation. Cache invalidation logic now includes theta_scale_updated flag, improving cache correctness.

Inference Impact

Token Generation Rate: The changes affect only the CANN backend RoPE implementation within the GGML computation graph layer. The core inference functions llama_decode(), llama_encode(), and llama_tokenize() in the llama.cpp API layer are not modified. Token generation rate impact depends on:

Model architecture: Only models using partial RoPE on CANN backend are affected
Backend selection: CPU and other GPU backends remain unchanged
RoPE frequency: Impact scales with number of RoPE operations per token

For models using full RoPE or running on non-CANN backends, tokens per second remains unchanged.

Power Consumption

Power consumption analysis applies to binaries containing the modified CANN backend code. The additional copy operations in partial RoPE path increase cumulative execution time, resulting in higher power draw proportional to the throughput time increase. Binaries using full RoPE or non-CANN backends show no power consumption change.

loci-dev temporarily deployed to PROD__AL_DEMO November 27, 2025 09:37 — with GitHub Actions Inactive

loci-dev force-pushed the main branch 8 times, most recently from 9a74048 to af6127b Compare November 28, 2025 20:09

cann: fix review comment

e0d679c

loci-dev temporarily deployed to PROD__AL_DEMO November 29, 2025 03:46 — with GitHub Actions Inactive

loci-dev force-pushed the main branch 11 times, most recently from 1854a53 to 1b177fe Compare November 30, 2025 15:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

UPSTREAM PR #17543: CANN: add support for partial RoPE and Vision mode #344

UPSTREAM PR #17543: CANN: add support for partial RoPE and Vision mode #344

Uh oh!

loci-dev commented Nov 27, 2025

Uh oh!

loci-agentic-ai bot commented Nov 27, 2025

Uh oh!

loci-agentic-ai bot commented Nov 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

UPSTREAM PR #17543: CANN: add support for partial RoPE and Vision mode #344

Are you sure you want to change the base?

UPSTREAM PR #17543: CANN: add support for partial RoPE and Vision mode #344

Uh oh!

Conversation

loci-dev commented Nov 27, 2025

Uh oh!

loci-agentic-ai bot commented Nov 27, 2025

Performance Analysis Summary - PR #344

Uh oh!

loci-agentic-ai bot commented Nov 29, 2025

Performance Review Summary: PR #344 - CANN Backend Partial RoPE Support

Overview

Key Findings

Performance-Critical Function Impact

Inference Impact

Power Consumption

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants