Skip to content

Add GGUF file support in CLI and printing functions#8

Closed
vm7608 wants to merge 2 commits intoalvarobartt:mainfrom
vm7608:support-gguf
Closed

Add GGUF file support in CLI and printing functions#8
vm7608 wants to merge 2 commits intoalvarobartt:mainfrom
vm7608:support-gguf

Conversation

@vm7608
Copy link
Contributor

@vm7608 vm7608 commented Jan 7, 2026

Summary

  • Added functionality to check for GGUF files in the CLI and print a report using print_report_for_gguf.
  • Updated error handling to include GGUF files in the search criteria.
  • Introduced new helper functions in print.py for formatting and displaying GGUF file reports, including grouping sharded files and adjusting table widths.
  • Updated function _bytes_to_gb with use_decimal argument to match with Huggingface file size.

Example

hf-mem --model-id Qwen/Qwen3-VL-8B-Instruct-GGUF

image

Copilot AI review requested due to automatic review settings January 7, 2026 04:31
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request adds support for GGUF model files to the CLI tool, extending its capability beyond safetensors models. The changes introduce new printing functions specifically designed for displaying GGUF file information in a formatted table, along with file grouping logic to handle sharded GGUF files.

Key changes:

  • Added GGUF file detection and reporting in the CLI fallback path when no safetensors files are found
  • Implemented new table formatting functions for GGUF files with dynamic column widths based on filename lengths
  • Enhanced _bytes_to_gb function with a use_decimal parameter to support both binary (GiB) and decimal (GB) conversions for HuggingFace compatibility

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 9 comments.

File Description
src/hf_mem/print.py Added GGUF-specific table printing functions (_print_header_gguf, _print_centered_gguf, _print_divider_gguf, _print_row_gguf), file grouping logic for sharded GGUF files, and the main print_report_for_gguf function; updated _bytes_to_gb to support decimal conversion
src/hf_mem/cli.py Added GGUF file detection in the fallback path, integrated print_report_for_gguf function call, and updated error message to include GGUF files

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

def _group_gguf_files(gguf_files: Dict[str, int]) -> Dict[str, int]:
"""Group sharded GGUF files by model variant and sum their sizes.

Files like 'BF16/model-00001-of-00010.gguf' are grouped together.
Copy link

Copilot AI Jan 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docstring example 'BF16/model-00001-of-00010.gguf' suggests that directory prefixes are included in the grouping, but the regex pattern only matches the shard suffix and replaces it. This means files like 'BF16/model-00001-of-00010.gguf' and 'BF16/model-00002-of-00010.gguf' would be grouped to 'BF16/model.gguf', which is correct. However, the comment "Single files like 'model-Q4_K_M.gguf' remain as-is" is accurate. The docstring is clear, but it would be helpful to add an example showing what the base name would be after grouping (e.g., 'BF16/model-00001-of-00010.gguf' → 'BF16/model.gguf').

Suggested change
Files like 'BF16/model-00001-of-00010.gguf' are grouped together.
Files like 'BF16/model-00001-of-00010.gguf' are grouped together.
For example, 'BF16/model-00001-of-00010.gguf' and
'BF16/model-00002-of-00010.gguf' are grouped under 'BF16/model.gguf'.

Copilot uses AI. Check for mistakes.
gguf_files: Dict[str, int],
ignore_table_width: bool = False,
) -> None:
"""Print VRAM report for GGUF models.
Copy link

Copilot AI Jan 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docstring says "Print VRAM report for GGUF models" but GGUF files are typically stored on disk and loaded into VRAM for inference. The function is actually printing file sizes (disk usage), not VRAM/memory usage. Consider changing "VRAM report" to "file size report" or "disk usage report" to more accurately reflect what the function does.

Suggested change
"""Print VRAM report for GGUF models.
"""Print file size report for GGUF model files.

Copilot uses AI. Check for mistakes.
Comment on lines +213 to +217
max_len = current_len + name_len - BORDERS_AND_PADDING
total_width = max_len + 12
text_len = len(text)
pad_left = (total_width - text_len) // 2
pad_right = total_width - text_len - pad_left
Copy link

Copilot AI Jan 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The width calculation appears inconsistent with other parts of the table. The formula current_len + name_len - BORDERS_AND_PADDING followed by + 12 doesn't align with how _print_header_gguf, _print_divider_gguf, and _print_row_gguf calculate their widths. This could cause misalignment where the centered text width doesn't match the actual table width defined by the header and dividers. Consider verifying the calculation matches the table structure created by the other GGUF printing functions.

Suggested change
max_len = current_len + name_len - BORDERS_AND_PADDING
total_width = max_len + 12
text_len = len(text)
pad_left = (total_width - text_len) // 2
pad_right = total_width - text_len - pad_left
# The centered line should have the same total width as GGUF rows:
# total_width = name_len + current_len + 6 (including outer borders).
# Since this function explicitly prints the two outer vertical borders,
# the inner content width (spaces + text between borders) is:
inner_width = name_len + current_len + 4
text_len = len(text)
# Ensure we don't use negative padding if the text is longer than the inner width.
total_padding = max(0, inner_width - text_len)
pad_left = total_padding // 2
pad_right = total_padding - pad_left

Copilot uses AI. Check for mistakes.
Single files like 'model-Q4_K_M.gguf' remain as-is.
"""
grouped: Dict[str, int] = {}
shard_pattern = re.compile(r"-\d{5}-of-\d{5}\.gguf$")
Copy link

Copilot AI Jan 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The regex pattern assumes sharded GGUF files always follow the format -\d{5}-of-\d{5}\.gguf$ (exactly 5 digits). However, some GGUF models might use different shard numbering formats (e.g., -00001-of-00002.gguf vs -0001-of-0002.gguf or -1-of-2.gguf). Consider using a more flexible pattern like -\d+-of-\d+\.gguf$ to handle various numbering schemes, or document the assumption that only 5-digit formats are supported.

Suggested change
shard_pattern = re.compile(r"-\d{5}-of-\d{5}\.gguf$")
shard_pattern = re.compile(r"-\d+-of-\d+\.gguf$")

Copilot uses AI. Check for mistakes.
ignore_table_width: Whether to ignore max table width
"""
grouped_files = _group_gguf_files(gguf_files)

Copy link

Copilot AI Jan 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function can raise a ValueError if grouped_files is empty (when gguf_files is empty or all files are filtered out). The max() function will fail with an empty sequence. This should be handled before calling max(), either by checking if grouped_files is empty or by validating that gguf_files is not empty earlier in the flow.

Suggested change
if not grouped_files:
warnings.warn("No GGUF files found to report on; skipping VRAM report.")
return

Copilot uses AI. Check for mistakes.
Comment on lines +203 to 207
else:
raise RuntimeError(
"NONE OF `model.safetensors`, `model.safetensors.index.json`, `model_index.json`, OR `.gguf` FILES HAVE BEEN FOUND"
)

Copy link

Copilot AI Jan 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The condition checking for GGUF files could be simplified. The dictionary comprehension already filters out files where size is None or path doesn't end with .gguf, so checking if gguf_files: is sufficient. The explicit else branch with the RuntimeError could be dedented to remove unnecessary nesting.

Suggested change
else:
raise RuntimeError(
"NONE OF `model.safetensors`, `model.safetensors.index.json`, `model_index.json`, OR `.gguf` FILES HAVE BEEN FOUND"
)
raise RuntimeError(
"NONE OF `model.safetensors`, `model.safetensors.index.json`, `model_index.json`, OR `.gguf` FILES HAVE BEEN FOUND"
)

Copilot uses AI. Check for mistakes.
left, mid, right = BOX["lm"], BOX["mm"], BOX["rm"]

name_col_inner = name_len + 2
data_col_inner = current_len + 1
Copy link

Copilot AI Jan 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inconsistent width calculation: The function is called with current_len + 1 (lines 311, 318, 320), but inside the function at line 239, it already adds 1 again with data_col_inner = current_len + 1. This means the data column width will be current_len + 2 in dividers, while _print_row_gguf uses just current_len (line 251). This inconsistency will cause table misalignment. Either remove the + 1 from the function calls or remove it from line 239, depending on which width is correct.

Suggested change
data_col_inner = current_len + 1
data_col_inner = current_len

Copilot uses AI. Check for mistakes.
return
else:
raise RuntimeError(
"NONE OF `model.safetensors`, `model.safetensors.index.json`, `model_index.json`, OR `.gguf` FILES HAVE BEEN FOUND"
Copy link

Copilot AI Jan 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error message uses inconsistent formatting for file types. It mentions model.safetensors, model.safetensors.index.json, and model_index.json as specific filenames (in backticks), but then refers to .gguf FILES as a pattern/extension. For consistency, either list specific common GGUF filenames or refer to all items as patterns (e.g., ".safetensors", ".safetensors.index.json", "model_index.json", or "*.gguf").

Suggested change
"NONE OF `model.safetensors`, `model.safetensors.index.json`, `model_index.json`, OR `.gguf` FILES HAVE BEEN FOUND"
"NONE OF `*.safetensors`, `*.safetensors.index.json`, `model_index.json`, OR `*.gguf` FILES HAVE BEEN FOUND"

Copilot uses AI. Check for mistakes.
Comment on lines +196 to +201
print_report_for_gguf(
model_id=model_id,
revision=revision,
gguf_files=gguf_files,
ignore_table_width=ignore_table_width,
)
Copy link

Copilot AI Jan 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When GGUF files are found and printed, the function returns early (line 202) without handling the json_output flag. This is inconsistent with the safetensors flow which respects the --json-output flag. Users expecting JSON output will get table output instead when using GGUF files. Consider adding JSON output support for GGUF files or documenting that JSON output is not supported for GGUF models.

Suggested change
print_report_for_gguf(
model_id=model_id,
revision=revision,
gguf_files=gguf_files,
ignore_table_width=ignore_table_width,
)
if json_output:
out = {
"model_id": model_id,
"revision": revision,
"gguf_files": gguf_files,
}
print(json.dumps(out))
else:
print_report_for_gguf(
model_id=model_id,
revision=revision,
gguf_files=gguf_files,
ignore_table_width=ignore_table_width,
)

Copilot uses AI. Check for mistakes.
Copy link
Owner

@alvarobartt alvarobartt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @vm7608! It's nice that for GGUF we can simply grab the required VRAM from the actual size of the file, but I'd rather format this differently e.g.

| <DEFAULT HEADER> |
| <DIVIDER> |
| <FILENAME> |
| <DTYPE> w/ <PARAMS> | <RAM BAR> |

I'm happy to take on another PR and add you as co-author, see the latest changes on https://github.com/alvarobartt/hf-mem/pulls/10 if you want to take a shot at it, otherwise I'm happy to do it myself!

Thanks again for the PR 🤗

P.S. I also think we could actually benefit from the current print_report function to parse the metadata in a way that's readable by that function, to ensure it follows the same formatting, but happy to hear suggestions!

vm7608 added 2 commits January 7, 2026 22:20
- Added functionality to check for GGUF files in the CLI and print a report using `print_report_for_gguf`.
- Updated error handling to include GGUF files in the search criteria.
- Introduced new helper functions in `print.py` for formatting and displaying GGUF file reports, including grouping sharded files and adjusting table widths.
- Updated function `_bytes_to_gb` with `use_decimal` argument to match with Huggingface file size.
- Updated `_print_header`, `_print_centered`, `_print_divider`, `_format_name`, and `_print_row` functions to include an optional `name_len` parameter for improved flexibility in formatting.
- Removed redundant GGUF-specific print functions, consolidating functionality into existing print methods.
- Adjusted the `print_report_for_gguf` function to utilize the refactored print methods, enhancing code maintainability.
@vm7608
Copy link
Contributor Author

vm7608 commented Jan 8, 2026

Hi @alvarobartt ,

Thanks a lot for the feedback! I've just pushed a new commit to better leverage the existing print helper functions as you suggested.

However, I decided to keep print_report_for_gguf separate. My reasoning is that unlike Safetensors (where the output breaks down multiple components for the total VRAM), a GGUF file usually corresponds directly to a specific model version. Therefore, I felt that having a dedicated print function would be more appropriate/cleaner for this structure.

I'd be happy to hear any other suggestions you might have! Just a heads-up though, I'm a bit busy at the moment, so I might not be able to push updates immediately or continuously. 😅

Thanks again! 🚀

@alvarobartt
Copy link
Owner

Hey @vm7608 do you agree we close this PR in favour of #25, with proper attribution to your earlier work (which I think @diegovelilla has already done)? 🤗

@vm7608 vm7608 closed this Feb 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants