Skip to content

Conversation

@loci-dev
Copy link

Mirrored from ggml-org/llama.cpp#17553

Fix gguf_new_metadata.py for non-native endian files

gguf_new_metadata.py reads data from reader.
Reader doesn't byteswap tensors to native endianness.
But writer does expect tensors in native endianness to convert them
into requested endianness.

There are two ways to fix this: update reader and do conversion to native endianness and back,
or skip converting endianness in writer in this particular USE-case.

Let's go with second approach to skip excessive in this case conversions.

Fix gguf_editor_gui.py for non-native endian files

Since it doesn't allow editing or viewing tensor data,
just skip byteswapping when writing data back into file.

If eventually capability to view or edit tensor data is added,
tensor data should be instead byteswapped when reading it.

These changes can be verified even on little endian systems.
For that 2 copies of files are needed.
One copy is modified first and then byteswapped to non-native endian.
Second copy is byteswapped to non-native endian first and then modified same way as first copy.
Both copies should be identical after such modifications.

@loci-agentic-ai
Copy link

Explore the complete analysis inside the Version Insights

Performance Analysis Summary: PR #348

Analysis Overview

PR #348 introduces correctness fixes for endianness handling in Python GGUF utilities (gguf_writer.py, gguf_editor_gui.py, gguf_new_metadata.py). Performance analysis shows 0.0% change across all 16 binaries. No C++ inference code was modified.

Condition Assessment: Condition 1 applies - no performance metric changes detected.

Performance Impact

Binary Analysis:
All binaries show zero measurable change in power consumption:

  • libllama.so: 190,887 nJ (0.0% change)
  • libggml.so: 4,031 nJ (0.0% change)
  • libggml-cpu.so: 115,243 nJ (0.0% change)
  • All CLI tools: 0.0% change

Inference Performance:
No impact on tokens per second. Changes are isolated to Python metadata manipulation utilities and do not affect inference functions (llama_decode, llama_encode, llama_tokenize).


Conclusion: This PR fixes data corruption in non-native endian GGUF file handling without affecting runtime performance.

… files

gguf_new_metadata.py reads data from reader.
Reader doesn't byteswap tensors to native endianness.
But writer does expect tensors in native endianness to convert them
into requested endianness.

There are two ways to fix this: update reader and do conversion to native endianness and back,
or skip converting endianness in writer in this particular USE-case.

gguf_editor_gui.py doesn't allow editing or viewing tensor data.
Let's go with skipping excessive byteswapping.

If eventually capability to view or edit tensor data is added,
tensor data should be instead byteswapped when reading it.
@loci-dev loci-dev force-pushed the upstream-PR17553-branch_AlekseiNikiforovIBM-s390x_modifying_scripts branch from 479cda8 to e3bd936 Compare November 28, 2025 14:37
@loci-agentic-ai
Copy link

Explore the complete analysis inside the Version Insights

Performance Analysis Summary - PR #348

Analysis Overview

PR #348 introduces endianness handling fixes in Python utility scripts (gguf_writer.py, gguf_editor_gui.py, gguf_new_metadata.py). Performance analysis confirms zero impact on compiled binaries. All 16 analyzed binaries show 0.0% power consumption change. No function-level performance deltas detected between versions 1e15ddf2-b59d-49cc-b843-9f0661a545a2 and 04c91645-829d-48f6-9e12-61790e0ebdc2.

The changes modify Python-only code paths for GGUF file manipulation, adding explicit tensor_endianess parameters to prevent data corruption during cross-endian workflows. No modifications to C++ inference engine, model loading, tokenization, or batch processing components.

Inference Performance Impact: None. Core inference functions (llama_decode, llama_encode, llama_tokenize) remain unchanged with identical response time and throughput measurements. Tokens per second throughput is unaffected as no inference path modifications occurred.

Power Consumption: All binaries maintain identical power consumption profiles, including libllama.so (193182 nJ), libggml-cpu.so (115347 nJ), and inference utilities.

@loci-dev loci-dev force-pushed the main branch 11 times, most recently from e4a4e1d to d0b408b Compare November 30, 2025 02:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants