-
Notifications
You must be signed in to change notification settings - Fork 0
UPSTREAM PR #17543: CANN: add support for partial RoPE and Vision mode #344
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Add support for two important RoPE variants: partial rotation (rope_dims < ne0)
and Vision mode rotation.
1. Support for partial RoPE (rope_dims < ne0):
- Split tensor into head (first rope_dims dimensions) and tail portions
- Apply rotation only to head portion using RotaryPositionEmbedding operator
- Copy unrotated tail portion directly from source to destination
- Handle both contiguous and non-contiguous tensor layouts
2. Support for Vision mode (GGML_ROPE_TYPE_VISION):
- Set rope_dims = ne0 for Vision mode to rotate entire tensor
- Vision mode pairs dimension i with dimension i+n_dims (where n_dims = ne0/2)
- No tail handling needed since entire tensor is rotated
Implementation details:
- Use has_tail flag to determine execution path: head/tail splitting when
rope_dims < ne0, or full tensor rotation when rope_dims == ne0
- Support both F32 and F16 data types with intermediate F32 conversion
- Copy non-contiguous tensors to contiguous buffers before calling
RotaryPositionEmbedding operator for compatibility
- Improve cache invalidation logic to include rope_dims and indep_sects
parameters
These enhancements enable CANN backend to handle various RoPE configurations
used in modern vision-language models and models with partial rotation.
|
Explore the complete analysis inside the Version Insights Performance Analysis Summary - PR #344Analysis: This PR implements partial RoPE and vision mode support for the CANN backend across 3 files with 222 additions and 70 deletions. The changes modify the Performance Impact: No measurable performance changes detected. Power consumption analysis shows less than 0.001% variation across all binaries, with maximum absolute delta of 0.66 nJ in libllama.so. No functions show measurable changes in response time or throughput time between versions. Inference Impact: No impact on tokens per second. The core inference functions (llama_decode, llama_encode, llama_tokenize) show no response time or throughput changes. The modifications are isolated to CANN backend RoPE operations, which do not affect CPU-based tokenization or inference paths. Code Changes: The implementation adds conditional logic for partial rotation (when |
9a74048 to
af6127b
Compare
|
Explore the complete analysis inside the Version Insights Performance Review Summary: PR #344 - CANN Backend Partial RoPE SupportOverviewPR #344 implements partial Rotary Position Embedding and Vision mode support in the CANN backend ( Key FindingsPerformance-Critical Function ImpactThe modified
For partial RoPE cases with typical attention dimensions ( The Inference ImpactToken Generation Rate: The changes affect only the CANN backend RoPE implementation within the GGML computation graph layer. The core inference functions
For models using full RoPE or running on non-CANN backends, tokens per second remains unchanged. Power ConsumptionPower consumption analysis applies to binaries containing the modified CANN backend code. The additional copy operations in partial RoPE path increase cumulative execution time, resulting in higher power draw proportional to the throughput time increase. Binaries using full RoPE or non-CANN backends show no power consumption change. |
1854a53 to
1b177fe
Compare
Mirrored from ggml-org/llama.cpp#17543
Add support for two important RoPE variants: partial rotation (rope_dims < ne0) and Vision mode rotation.
Support for partial RoPE (rope_dims < ne0):
Support for Vision mode (GGML_ROPE_TYPE_VISION):
Implementation details:
These enhancements enable CANN backend to handle various RoPE configurations used in modern vision-language models and models with partial rotation.
Make sure to read the contributing guidelines before submitting a PR