Skip to content

Conversation

@DajanaV
Copy link
Collaborator

@DajanaV DajanaV commented Nov 7, 2025

Mirrored from ggml-org/llama.cpp#16609

Summary

Fixed path handling in the mtmd feature on Windows to correctly convert and process non-ASCII file paths.

Additionally, fixed a missing console initialization in mtmd-cli.

Issues Resolved

  • Failed to load .mmproj files with non-ASCII file paths

  • Failed to load image files with non-ASCII file paths

  • Incorrect path parsing in mtmd-cli when using /image <filepath> with non-ASCII characters

Comparison of behavior

Windows 11 Pro 24H2

PS > Get-WinSystemLocale
LCID             Name             DisplayName
----             ----             -----------
1041             ja-JP            Japanese (Japan)
PS > [Console]::InputEncoding.EncodingName
Japanese (Shift-JIS)
PS > [Console]::OutputEncoding.EncodingName
Unicode (UTF-8)

b6756

PS > llama-mtmd-cli.exe -m ... --mmproj ...
~~~
> /image テスト画像.jpg
Unable to open file ツテツスツトツ嘉ヲツ堕・jpg: Illegal byte sequence

This PR

PS > llama-mtmd-cli.exe -m ... --mmproj ...
~~~
> /image テスト画像.jpg
テスト画像.jpg image loaded

Notes

The general argument-handling logic in llama.cpp also has issues with non-ASCII paths.
Since the impact area is broad, I will submit a separate PR to address this.

@loci-agentic-ai
Copy link

Access the complete analysis in the LOCI Dashboard

Performance Analysis Summary

Overview

Analysis of LLaMA.cpp project comparing versions f60c3df0 and 96963f27 reveals minimal performance impact from recent changes. The modifications primarily address Windows file path handling for non-ASCII characters in the mtmd (multimodal) feature, with no changes to core inference functions.

Key Findings

Performance Metrics:

  • Highest Response Time change: set_warmup_n_tokens function (+2.95%, 222 ns → 229 ns)
  • Highest Throughput change: CLIP image batch encode operator (+6.16%, 144 ns → 153 ns)
  • Both functions are part of the CLIP vision processing pipeline, not core LLaMA inference

Core Function Impact:
No changes detected in critical inference functions (llama_decode, llama_encode, llama_tokenize). The performance regressions are isolated to CLIP model parameter configuration and image processing components that do not affect tokens-per-second performance for text generation workloads.

Power Consumption Analysis:

  • build.bin.libmtmd.so: +0.007% increase (210,006 nJ → 210,021 nJ)
  • build.bin.llama-cvector-generator: -100% (binary removed)
  • build.bin.llama-tts: -100% (binary removed)
  • All other binaries show no measurable change

Technical Analysis:
Flame graph analysis reveals the set_warmup_n_tokens regression stems from mathematical computation overhead (sqrt operations consuming 24% of execution time). CFG comparison shows identical control flow structures between versions, indicating the performance change likely results from compiler optimization differences rather than algorithmic modifications.

Code Review Insights:
The GitHub PR introduces Windows-specific UTF-8 to UTF-16 file path conversion for mtmd components. Changes are well-isolated with proper error handling and maintain cross-platform compatibility. No impact on core inference performance.

Inference Performance Impact:
Since no core tokenization or inference functions show performance changes, there is no expected impact on tokens-per-second throughput for standard LLM inference workloads. The observed regressions affect only multimodal preprocessing components.

The analysis indicates stable performance for primary LLaMA.cpp functionality with localized, non-critical changes in auxiliary components.

3 similar comments
@loci-agentic-ai
Copy link

Access the complete analysis in the LOCI Dashboard

Performance Analysis Summary

Overview

Analysis of LLaMA.cpp project comparing versions f60c3df0 and 96963f27 reveals minimal performance impact from recent changes. The modifications primarily address Windows file path handling for non-ASCII characters in the mtmd (multimodal) feature, with no changes to core inference functions.

Key Findings

Performance Metrics:

  • Highest Response Time change: set_warmup_n_tokens function (+2.95%, 222 ns → 229 ns)
  • Highest Throughput change: CLIP image batch encode operator (+6.16%, 144 ns → 153 ns)
  • Both functions are part of the CLIP vision processing pipeline, not core LLaMA inference

Core Function Impact:
No changes detected in critical inference functions (llama_decode, llama_encode, llama_tokenize). The performance regressions are isolated to CLIP model parameter configuration and image processing components that do not affect tokens-per-second performance for text generation workloads.

Power Consumption Analysis:

  • build.bin.libmtmd.so: +0.007% increase (210,006 nJ → 210,021 nJ)
  • build.bin.llama-cvector-generator: -100% (binary removed)
  • build.bin.llama-tts: -100% (binary removed)
  • All other binaries show no measurable change

Technical Analysis:
Flame graph analysis reveals the set_warmup_n_tokens regression stems from mathematical computation overhead (sqrt operations consuming 24% of execution time). CFG comparison shows identical control flow structures between versions, indicating the performance change likely results from compiler optimization differences rather than algorithmic modifications.

Code Review Insights:
The GitHub PR introduces Windows-specific UTF-8 to UTF-16 file path conversion for mtmd components. Changes are well-isolated with proper error handling and maintain cross-platform compatibility. No impact on core inference performance.

Inference Performance Impact:
Since no core tokenization or inference functions show performance changes, there is no expected impact on tokens-per-second throughput for standard LLM inference workloads. The observed regressions affect only multimodal preprocessing components.

The analysis indicates stable performance for primary LLaMA.cpp functionality with localized, non-critical changes in auxiliary components.

@loci-agentic-ai
Copy link

Access the complete analysis in the LOCI Dashboard

Performance Analysis Summary

Overview

Analysis of LLaMA.cpp project comparing versions f60c3df0 and 96963f27 reveals minimal performance impact from recent changes. The modifications primarily address Windows file path handling for non-ASCII characters in the mtmd (multimodal) feature, with no changes to core inference functions.

Key Findings

Performance Metrics:

  • Highest Response Time change: set_warmup_n_tokens function (+2.95%, 222 ns → 229 ns)
  • Highest Throughput change: CLIP image batch encode operator (+6.16%, 144 ns → 153 ns)
  • Both functions are part of the CLIP vision processing pipeline, not core LLaMA inference

Core Function Impact:
No changes detected in critical inference functions (llama_decode, llama_encode, llama_tokenize). The performance regressions are isolated to CLIP model parameter configuration and image processing components that do not affect tokens-per-second performance for text generation workloads.

Power Consumption Analysis:

  • build.bin.libmtmd.so: +0.007% increase (210,006 nJ → 210,021 nJ)
  • build.bin.llama-cvector-generator: -100% (binary removed)
  • build.bin.llama-tts: -100% (binary removed)
  • All other binaries show no measurable change

Technical Analysis:
Flame graph analysis reveals the set_warmup_n_tokens regression stems from mathematical computation overhead (sqrt operations consuming 24% of execution time). CFG comparison shows identical control flow structures between versions, indicating the performance change likely results from compiler optimization differences rather than algorithmic modifications.

Code Review Insights:
The GitHub PR introduces Windows-specific UTF-8 to UTF-16 file path conversion for mtmd components. Changes are well-isolated with proper error handling and maintain cross-platform compatibility. No impact on core inference performance.

Inference Performance Impact:
Since no core tokenization or inference functions show performance changes, there is no expected impact on tokens-per-second throughput for standard LLM inference workloads. The observed regressions affect only multimodal preprocessing components.

The analysis indicates stable performance for primary LLaMA.cpp functionality with localized, non-critical changes in auxiliary components.

@loci-agentic-ai
Copy link

Access the complete analysis in the LOCI Dashboard

Performance Analysis Summary

Overview

Analysis of LLaMA.cpp project comparing versions f60c3df0 and 96963f27 reveals minimal performance impact from recent changes. The modifications primarily address Windows file path handling for non-ASCII characters in the mtmd (multimodal) feature, with no changes to core inference functions.

Key Findings

Performance Metrics:

  • Highest Response Time change: set_warmup_n_tokens function (+2.95%, 222 ns → 229 ns)
  • Highest Throughput change: CLIP image batch encode operator (+6.16%, 144 ns → 153 ns)
  • Both functions are part of the CLIP vision processing pipeline, not core LLaMA inference

Core Function Impact:
No changes detected in critical inference functions (llama_decode, llama_encode, llama_tokenize). The performance regressions are isolated to CLIP model parameter configuration and image processing components that do not affect tokens-per-second performance for text generation workloads.

Power Consumption Analysis:

  • build.bin.libmtmd.so: +0.007% increase (210,006 nJ → 210,021 nJ)
  • build.bin.llama-cvector-generator: -100% (binary removed)
  • build.bin.llama-tts: -100% (binary removed)
  • All other binaries show no measurable change

Technical Analysis:
Flame graph analysis reveals the set_warmup_n_tokens regression stems from mathematical computation overhead (sqrt operations consuming 24% of execution time). CFG comparison shows identical control flow structures between versions, indicating the performance change likely results from compiler optimization differences rather than algorithmic modifications.

Code Review Insights:
The GitHub PR introduces Windows-specific UTF-8 to UTF-16 file path conversion for mtmd components. Changes are well-isolated with proper error handling and maintain cross-platform compatibility. No impact on core inference performance.

Inference Performance Impact:
Since no core tokenization or inference functions show performance changes, there is no expected impact on tokens-per-second throughput for standard LLM inference workloads. The observed regressions affect only multimodal preprocessing components.

The analysis indicates stable performance for primary LLaMA.cpp functionality with localized, non-critical changes in auxiliary components.

@DajanaV DajanaV force-pushed the main branch 15 times, most recently from aa2fc28 to 0ad40ce Compare November 9, 2025 17:06
@loci-dev loci-dev force-pushed the main branch 30 times, most recently from 7d44551 to 8eaea62 Compare December 4, 2025 15:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants