UPSTREAM PR #17548: models : fix LFM2 tensors #347

loci-dev · 2025-11-27T13:41:48Z

alt #17248

Force token embeddings to be at the start of the graph
Fix LFM2 output norm tensor
Fix LLM_TENSOR_TOKEN_EMBD_NORM tensor info

loci-agentic-ai · 2025-11-27T14:21:08Z

Explore the complete analysis inside the Version Insights

Performance Analysis Summary: PR #347

Overview

This PR corrects LFM2 model tensor classifications and graph construction order. Changes span 3 files with 21 line modifications, primarily addressing tensor type misclassification and ensuring token embeddings appear at graph start.

Key Findings

Performance-Critical Areas Impact

Inference Pipeline Functions:
No direct modifications to core inference functions (llama_decode, llama_encode, llama_tokenize). The changes affect LFM2-specific model loading and graph construction but do not alter the execution path of primary inference functions.

Tokens Per Second Impact:
No measurable impact on tokens per second expected. The PR modifies LFM2 architecture-specific tensor mapping and graph ordering without changing the computational logic of tokenization or decoding functions. Response time and throughput of llama_decode remain unchanged.

Modified Components:

src/llama-arch.cpp: Tensor type mapping correction (LLM_TENSOR_OUTPUT_NORM vs LLM_TENSOR_TOKEN_EMBD_NORM)
src/llama-model.cpp: Model structure update (tok_norm → output_norm)
src/models/lfm2.cpp: Graph construction ordering and tensor reference updates

Absolute Performance Changes:
The changes introduce graph ordering optimization by forcing token embeddings to graph start via ggml_build_forward_expand. This affects memory allocation timing but not computational cost. Operation type correction from GGML_OP_GET_ROWS to GGML_OP_MUL for token embedding normalization aligns operation semantics with actual usage pattern.

Power Consumption:
Analysis operates at binary level. Changes affect LFM2-specific code paths within the main binary. No additional computational operations introduced; structural corrections maintain equivalent operation count.

Scope:
Changes isolated to LFM2 and LFM2MOE architectures. Other model types (GPT, LLAMA, etc.) unaffected. Core inference loop unchanged.

models : fix LFM2 tensors

d93ff58

loci-dev temporarily deployed to PROD__AL_DEMO November 27, 2025 13:41 — with GitHub Actions Inactive

loci-dev force-pushed the main branch from a89c6ad to ad5ad9a Compare November 27, 2025 14:08

loci-dev force-pushed the main branch 16 times, most recently from 1854a53 to 1b177fe Compare November 30, 2025 15:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

UPSTREAM PR #17548: models : fix LFM2 tensors #347

UPSTREAM PR #17548: models : fix LFM2 tensors #347

loci-dev commented Nov 27, 2025

Uh oh!

loci-agentic-ai bot commented Nov 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

UPSTREAM PR #17548: models : fix LFM2 tensors #347

Are you sure you want to change the base?

UPSTREAM PR #17548: models : fix LFM2 tensors #347

Conversation

loci-dev commented Nov 27, 2025

Uh oh!

loci-agentic-ai bot commented Nov 27, 2025

Performance Analysis Summary: PR #347

Overview

Key Findings

Performance-Critical Areas Impact

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants