UPSTREAM PR #16333: ggml-cpu : inspect -march and -mcpu to found the CPU #123

DajanaV · 2025-11-07T17:35:35Z

This is related to the PR ggml-org/llama.cpp#16239

loci-agentic-ai · 2025-11-07T18:11:19Z

Access the complete analysis in the LOCI Dashboard

Performance Analysis Summary

Overview

Analysis of llama.cpp version 7f3d9888 compared to baseline 5ce9b30a reveals minimal performance impact from PR #123, which enhances ARM CPU detection in the build system. The changes are confined to CMake configuration files and do not modify core inference functions.

Key Findings

Performance Metrics:
• Highest Response Time change: llm_graph_input_out_ids::can_reuse() in build.bin.libllama.so increased by 0.096% (+0.063 ns, from 65.10 ns to 65.16 ns)
• Highest Throughput change: std::_Optional_base constructor in build.bin.llama-cvector-generator improved by 0.17% (-0.04 ns, from 23.56 ns to 23.52 ns)

Core Function Impact:
The changes do not affect critical inference functions (llama_decode, llama_encode, llama_tokenize) that directly impact tokens per second performance. The modified functions are utility components for graph optimization and JSON processing, not part of the main inference pipeline.

Inference Performance:
No measurable impact on tokens per second expected. The affected functions are not in the tokenization or inference critical path. Core functions like llama_decode() show no performance changes.

Power Consumption:
Negligible changes across all 15 binaries with total consumption remaining constant at ~1.7 million nanojoules. Largest change: -0.61 nJ in build.bin.libllama.so (effectively zero impact).

Technical Analysis:
• Flame Graph: llm_graph_input_out_ids::can_reuse() shows single-frame execution with 100% self-time, indicating highly optimized inline operations
• CFG Comparison: Identical assembly code between versions confirms the 0.096% regression stems from external factors (binary layout, instruction cache alignment) rather than algorithmic changes
• Code Review: PR #123 improves ARM CPU detection logic with better -march/-mcpu flag handling and enhanced GCC compatibility

Conclusion:
The performance variations are within measurement noise levels and represent build system improvements rather than functional regressions. No actionable performance optimizations required.

Signed-off-by: Adrien Gallouët <[email protected]>

DajanaV temporarily deployed to PROD__AL_DEMO November 7, 2025 17:35 — with GitHub Actions Inactive

DajanaV force-pushed the main branch 19 times, most recently from aa2fc28 to 0ad40ce Compare November 9, 2025 17:06

ggml-cpu : inspect -march and -mcpu to found the CPU

475b1d3

Signed-off-by: Adrien Gallouët <[email protected]>

DajanaV force-pushed the upstream-PR16333-branch_angt-inspect-march-and-mcpu-to-found-the-cpu branch from 267f8d5 to 475b1d3 Compare November 10, 2025 08:41

DajanaV had a problem deploying to PROD__AL_DEMO November 10, 2025 08:41 — with GitHub Actions Failure

DajanaV force-pushed the main branch 6 times, most recently from 6aa5dc2 to 81cedf2 Compare November 10, 2025 16:10

loci-dev force-pushed the main branch 30 times, most recently from a9fcc24 to ea62cd5 Compare December 10, 2025 00:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

UPSTREAM PR #16333: ggml-cpu : inspect -march and -mcpu to found the CPU #123

UPSTREAM PR #16333: ggml-cpu : inspect -march and -mcpu to found the CPU #123

DajanaV commented Nov 7, 2025

Uh oh!

loci-agentic-ai bot commented Nov 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

UPSTREAM PR #16333: ggml-cpu : inspect -march and -mcpu to found the CPU #123

Are you sure you want to change the base?

UPSTREAM PR #16333: ggml-cpu : inspect -march and -mcpu to found the CPU #123

Conversation

DajanaV commented Nov 7, 2025

Uh oh!

loci-agentic-ai bot commented Nov 7, 2025

Performance Analysis Summary

Overview

Key Findings

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants