UPSTREAM PR #16609: mtmd : Fix/Add non-ASCII file path support on Windows #124

DajanaV · 2025-11-07T22:36:11Z

Mirrored from ggml-org/llama.cpp#16609

Summary

Fixed path handling in the mtmd feature on Windows to correctly convert and process non-ASCII file paths.

Additionally, fixed a missing console initialization in mtmd-cli.

Issues Resolved

Failed to load .mmproj files with non-ASCII file paths
Failed to load image files with non-ASCII file paths
Incorrect path parsing in mtmd-cli when using /image <filepath> with non-ASCII characters

Comparison of behavior

Windows 11 Pro 24H2

PS > Get-WinSystemLocale
LCID             Name             DisplayName
----             ----             -----------
1041             ja-JP            Japanese (Japan)
PS > [Console]::InputEncoding.EncodingName
Japanese (Shift-JIS)
PS > [Console]::OutputEncoding.EncodingName
Unicode (UTF-8)

b6756

PS > llama-mtmd-cli.exe -m ... --mmproj ...
~~~
> /image テスト画像.jpg
Unable to open file ﾂテﾂスﾂトﾂ嘉ｦﾂ堕・jpg: Illegal byte sequence

This PR

PS > llama-mtmd-cli.exe -m ... --mmproj ...
~~~
> /image テスト画像.jpg
テスト画像.jpg image loaded

Notes

The general argument-handling logic in llama.cpp also has issues with non-ASCII paths.
Since the impact area is broad, I will submit a separate PR to address this.

This reverts commit 57fc675.

This reverts commit ba38a31.

This reverts commit 0bb73a3.

loci-agentic-ai · 2025-11-07T23:11:57Z

Access the complete analysis in the LOCI Dashboard

Performance Analysis Summary

Overview

Analysis of LLaMA.cpp project comparing versions f60c3df0 and 96963f27 reveals minimal performance impact from recent changes. The modifications primarily address Windows file path handling for non-ASCII characters in the mtmd (multimodal) feature, with no changes to core inference functions.

Key Findings

Performance Metrics:

Highest Response Time change: set_warmup_n_tokens function (+2.95%, 222 ns → 229 ns)
Highest Throughput change: CLIP image batch encode operator (+6.16%, 144 ns → 153 ns)
Both functions are part of the CLIP vision processing pipeline, not core LLaMA inference

Core Function Impact:
No changes detected in critical inference functions (llama_decode, llama_encode, llama_tokenize). The performance regressions are isolated to CLIP model parameter configuration and image processing components that do not affect tokens-per-second performance for text generation workloads.

Power Consumption Analysis:

build.bin.libmtmd.so: +0.007% increase (210,006 nJ → 210,021 nJ)
build.bin.llama-cvector-generator: -100% (binary removed)
build.bin.llama-tts: -100% (binary removed)
All other binaries show no measurable change

Technical Analysis:
Flame graph analysis reveals the set_warmup_n_tokens regression stems from mathematical computation overhead (sqrt operations consuming 24% of execution time). CFG comparison shows identical control flow structures between versions, indicating the performance change likely results from compiler optimization differences rather than algorithmic modifications.

Code Review Insights:
The GitHub PR introduces Windows-specific UTF-8 to UTF-16 file path conversion for mtmd components. Changes are well-isolated with proper error handling and maintain cross-platform compatibility. No impact on core inference performance.

Inference Performance Impact:
Since no core tokenization or inference functions show performance changes, there is no expected impact on tokens-per-second throughput for standard LLM inference workloads. The observed regressions affect only multimodal preprocessing components.

The analysis indicates stable performance for primary LLaMA.cpp functionality with localized, non-critical changes in auxiliary components.

loci-agentic-ai · 2025-11-07T23:11:57Z

Access the complete analysis in the LOCI Dashboard

Performance Analysis Summary

Overview

Analysis of LLaMA.cpp project comparing versions f60c3df0 and 96963f27 reveals minimal performance impact from recent changes. The modifications primarily address Windows file path handling for non-ASCII characters in the mtmd (multimodal) feature, with no changes to core inference functions.

Key Findings

Performance Metrics:

Highest Response Time change: set_warmup_n_tokens function (+2.95%, 222 ns → 229 ns)
Highest Throughput change: CLIP image batch encode operator (+6.16%, 144 ns → 153 ns)
Both functions are part of the CLIP vision processing pipeline, not core LLaMA inference

Core Function Impact:
No changes detected in critical inference functions (llama_decode, llama_encode, llama_tokenize). The performance regressions are isolated to CLIP model parameter configuration and image processing components that do not affect tokens-per-second performance for text generation workloads.

Power Consumption Analysis:

build.bin.libmtmd.so: +0.007% increase (210,006 nJ → 210,021 nJ)
build.bin.llama-cvector-generator: -100% (binary removed)
build.bin.llama-tts: -100% (binary removed)
All other binaries show no measurable change

Technical Analysis:
Flame graph analysis reveals the set_warmup_n_tokens regression stems from mathematical computation overhead (sqrt operations consuming 24% of execution time). CFG comparison shows identical control flow structures between versions, indicating the performance change likely results from compiler optimization differences rather than algorithmic modifications.

Code Review Insights:
The GitHub PR introduces Windows-specific UTF-8 to UTF-16 file path conversion for mtmd components. Changes are well-isolated with proper error handling and maintain cross-platform compatibility. No impact on core inference performance.

Inference Performance Impact:
Since no core tokenization or inference functions show performance changes, there is no expected impact on tokens-per-second throughput for standard LLM inference workloads. The observed regressions affect only multimodal preprocessing components.

The analysis indicates stable performance for primary LLaMA.cpp functionality with localized, non-critical changes in auxiliary components.

loci-agentic-ai · 2025-11-07T23:11:57Z

Access the complete analysis in the LOCI Dashboard

Performance Analysis Summary

Overview

Analysis of LLaMA.cpp project comparing versions f60c3df0 and 96963f27 reveals minimal performance impact from recent changes. The modifications primarily address Windows file path handling for non-ASCII characters in the mtmd (multimodal) feature, with no changes to core inference functions.

Key Findings

Performance Metrics:

Highest Response Time change: set_warmup_n_tokens function (+2.95%, 222 ns → 229 ns)
Highest Throughput change: CLIP image batch encode operator (+6.16%, 144 ns → 153 ns)
Both functions are part of the CLIP vision processing pipeline, not core LLaMA inference

Core Function Impact:
No changes detected in critical inference functions (llama_decode, llama_encode, llama_tokenize). The performance regressions are isolated to CLIP model parameter configuration and image processing components that do not affect tokens-per-second performance for text generation workloads.

Power Consumption Analysis:

build.bin.libmtmd.so: +0.007% increase (210,006 nJ → 210,021 nJ)
build.bin.llama-cvector-generator: -100% (binary removed)
build.bin.llama-tts: -100% (binary removed)
All other binaries show no measurable change

Technical Analysis:
Flame graph analysis reveals the set_warmup_n_tokens regression stems from mathematical computation overhead (sqrt operations consuming 24% of execution time). CFG comparison shows identical control flow structures between versions, indicating the performance change likely results from compiler optimization differences rather than algorithmic modifications.

Code Review Insights:
The GitHub PR introduces Windows-specific UTF-8 to UTF-16 file path conversion for mtmd components. Changes are well-isolated with proper error handling and maintain cross-platform compatibility. No impact on core inference performance.

Inference Performance Impact:
Since no core tokenization or inference functions show performance changes, there is no expected impact on tokens-per-second throughput for standard LLM inference workloads. The observed regressions affect only multimodal preprocessing components.

The analysis indicates stable performance for primary LLaMA.cpp functionality with localized, non-critical changes in auxiliary components.

loci-agentic-ai · 2025-11-07T23:11:57Z

Access the complete analysis in the LOCI Dashboard

Performance Analysis Summary

Overview

Analysis of LLaMA.cpp project comparing versions f60c3df0 and 96963f27 reveals minimal performance impact from recent changes. The modifications primarily address Windows file path handling for non-ASCII characters in the mtmd (multimodal) feature, with no changes to core inference functions.

Key Findings

Performance Metrics:

Highest Response Time change: set_warmup_n_tokens function (+2.95%, 222 ns → 229 ns)
Highest Throughput change: CLIP image batch encode operator (+6.16%, 144 ns → 153 ns)
Both functions are part of the CLIP vision processing pipeline, not core LLaMA inference

Core Function Impact:
No changes detected in critical inference functions (llama_decode, llama_encode, llama_tokenize). The performance regressions are isolated to CLIP model parameter configuration and image processing components that do not affect tokens-per-second performance for text generation workloads.

Power Consumption Analysis:

build.bin.libmtmd.so: +0.007% increase (210,006 nJ → 210,021 nJ)
build.bin.llama-cvector-generator: -100% (binary removed)
build.bin.llama-tts: -100% (binary removed)
All other binaries show no measurable change

Technical Analysis:
Flame graph analysis reveals the set_warmup_n_tokens regression stems from mathematical computation overhead (sqrt operations consuming 24% of execution time). CFG comparison shows identical control flow structures between versions, indicating the performance change likely results from compiler optimization differences rather than algorithmic modifications.

Code Review Insights:
The GitHub PR introduces Windows-specific UTF-8 to UTF-16 file path conversion for mtmd components. Changes are well-isolated with proper error handling and maintain cross-platform compatibility. No impact on core inference performance.

Inference Performance Impact:
Since no core tokenization or inference functions show performance changes, there is no expected impact on tokens-per-second throughput for standard LLM inference workloads. The observed regressions affect only multimodal preprocessing components.

The analysis indicates stable performance for primary LLaMA.cpp functionality with localized, non-critical changes in auxiliary components.

kuguma and others added 9 commits October 16, 2025 16:44

Supports multi-byte character image file paths

db2d53e

Perform the same console init process as llama-cli

751e060

Supports multi-byte character mmproj file paths

57fc675

Added handling for non-ASCII arguments on Windows

0bb73a3

Revert "Supports multi-byte character mmproj file paths"

ba38a31

This reverts commit 57fc675.

Reapply "Supports multi-byte character mmproj file paths"

dddf4ce

This reverts commit ba38a31.

Revert "Added handling for non-ASCII arguments on Windows"

4aa5897

This reverts commit 0bb73a3.

use LOG_ERR

91eb160

Merge branch 'master' into fix-mtmd-filepath-encoding

a2bc45b

DajanaV temporarily deployed to PROD__AL_DEMO November 7, 2025 22:36 — with GitHub Actions Inactive

DajanaV force-pushed the main branch from 475da08 to 5fc2eb6 Compare November 7, 2025 23:08

DajanaV force-pushed the main branch 15 times, most recently from aa2fc28 to 0ad40ce Compare November 9, 2025 17:06

loci-dev force-pushed the main branch 30 times, most recently from 7d44551 to 8eaea62 Compare December 4, 2025 15:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

UPSTREAM PR #16609: mtmd : Fix/Add non-ASCII file path support on Windows #124

UPSTREAM PR #16609: mtmd : Fix/Add non-ASCII file path support on Windows #124

DajanaV commented Nov 7, 2025

Uh oh!

loci-agentic-ai bot commented Nov 7, 2025

Uh oh!

loci-agentic-ai bot commented Nov 7, 2025

Uh oh!

loci-agentic-ai bot commented Nov 7, 2025

Uh oh!

loci-agentic-ai bot commented Nov 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

UPSTREAM PR #16609: mtmd : Fix/Add non-ASCII file path support on Windows #124

Are you sure you want to change the base?

UPSTREAM PR #16609: mtmd : Fix/Add non-ASCII file path support on Windows #124

Conversation

DajanaV commented Nov 7, 2025

Summary

Issues Resolved

Comparison of behavior

Notes

Uh oh!

loci-agentic-ai bot commented Nov 7, 2025

Performance Analysis Summary

Overview

Key Findings

Uh oh!

loci-agentic-ai bot commented Nov 7, 2025

Performance Analysis Summary

Overview

Key Findings

Uh oh!

loci-agentic-ai bot commented Nov 7, 2025

Performance Analysis Summary

Overview

Key Findings

Uh oh!

loci-agentic-ai bot commented Nov 7, 2025

Performance Analysis Summary

Overview

Key Findings

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants