UPSTREAM PR #17602: common : add minimalist multi-thread progress bar #368

loci-dev · 2025-11-29T20:36:24Z

I intentionally kept the bar simple without specifying part numbers (which ultimately don't matter much) the only thing we care about is tracking progress

I intentionally kept the bar simple without specifying part numbers (which ultimately don't matter much) the only thing we care about is tracking progress. Signed-off-by: Adrien Gallouët <[email protected]>

loci-agentic-ai · 2025-11-29T21:17:39Z

Explore the complete analysis inside the Version Insights

Performance Analysis Summary - PR #368

Overview

PR #368 introduces multi-threaded progress bar functionality to common/download.cpp, adding mutex-based synchronization and ANSI terminal sequences for concurrent download progress display. The modification affects the print_progress function, which is not part of the inference pipeline.

Key Findings

Impacted Function:

print_progress: Response time increased by 10508 ns (604 ns → 11113 ns) in llama-cvector-generator and 10433 ns (608 ns → 11041 ns) in llama-tts. Throughput increased by 301 ns and 275 ns respectively.

Code Changes:
The implementation adds thread-safe progress tracking using std::mutex and std::map<std::thread::id, int> for line assignment, plus ANSI escape sequences for cursor positioning. The response time increase stems from mutex acquisition (20-50 ns), map lookup operations (30-50 ns), and console I/O for ANSI sequences (5-8 microseconds across multiple std::cout calls).

Inference Impact:
No impact on tokens per second. The print_progress function operates during model loading and downloading operations, not during inference execution. Functions responsible for tokenization and inference (llama_decode, llama_encode, llama_tokenize) remain unmodified. The performance change is isolated to progress reporting, which occurs outside the token generation pipeline.

Power Consumption:

llama-tts: 0.321% increase (720 nJ total)
llama-cvector-generator: 0.159% increase (350 nJ total)

The power increase reflects the cumulative throughput changes in progress reporting functions. Since progress updates occur infrequently during downloads rather than continuously during inference, the total energy impact per operation remains minimal.

Context:
The 18x response time increase is confined to user-facing progress display during file operations. The added synchronization overhead enables clean multi-threaded progress bars without affecting model inference performance or token generation rates.

common : add minimalist multi-thread progress bar

3f49035

I intentionally kept the bar simple without specifying part numbers (which ultimately don't matter much) the only thing we care about is tracking progress. Signed-off-by: Adrien Gallouët <[email protected]>

loci-dev temporarily deployed to PROD__AL_DEMO November 29, 2025 20:36 — with GitHub Actions Inactive

loci-dev force-pushed the main branch from fc0f51d to 89ba2e9 Compare November 29, 2025 21:07

loci-dev force-pushed the main branch 2 times, most recently from e4a4e1d to d0b408b Compare November 30, 2025 02:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

UPSTREAM PR #17602: common : add minimalist multi-thread progress bar #368

UPSTREAM PR #17602: common : add minimalist multi-thread progress bar #368

Uh oh!

loci-dev commented Nov 29, 2025

Uh oh!

loci-agentic-ai bot commented Nov 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

UPSTREAM PR #17602: common : add minimalist multi-thread progress bar #368

Are you sure you want to change the base?

UPSTREAM PR #17602: common : add minimalist multi-thread progress bar #368

Uh oh!

Conversation

loci-dev commented Nov 29, 2025

Uh oh!

loci-agentic-ai bot commented Nov 29, 2025

Performance Analysis Summary - PR #368

Overview

Key Findings

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants