Add GGUF file support in CLI and printing functions by vm7608 · Pull Request #8 · alvarobartt/hf-mem

vm7608 · 2026-01-07T04:31:50Z

Summary

Added functionality to check for GGUF files in the CLI and print a report using print_report_for_gguf.
Updated error handling to include GGUF files in the search criteria.
Introduced new helper functions in print.py for formatting and displaying GGUF file reports, including grouping sharded files and adjusting table widths.
Updated function _bytes_to_gb with use_decimal argument to match with Huggingface file size.

Example

hf-mem --model-id Qwen/Qwen3-VL-8B-Instruct-GGUF

Copilot

Pull request overview

This pull request adds support for GGUF model files to the CLI tool, extending its capability beyond safetensors models. The changes introduce new printing functions specifically designed for displaying GGUF file information in a formatted table, along with file grouping logic to handle sharded GGUF files.

Key changes:

Added GGUF file detection and reporting in the CLI fallback path when no safetensors files are found
Implemented new table formatting functions for GGUF files with dynamic column widths based on filename lengths
Enhanced _bytes_to_gb function with a use_decimal parameter to support both binary (GiB) and decimal (GB) conversions for HuggingFace compatibility

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 9 comments.

File	Description
src/hf_mem/print.py	Added GGUF-specific table printing functions (`_print_header_gguf`, `_print_centered_gguf`, `_print_divider_gguf`, `_print_row_gguf`), file grouping logic for sharded GGUF files, and the main `print_report_for_gguf` function; updated `_bytes_to_gb` to support decimal conversion
src/hf_mem/cli.py	Added GGUF file detection in the fallback path, integrated `print_report_for_gguf` function call, and updated error message to include GGUF files

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-01-07T04:35:27Z

src/hf_mem/print.py

+def _group_gguf_files(gguf_files: Dict[str, int]) -> Dict[str, int]:
+    """Group sharded GGUF files by model variant and sum their sizes.
+
+    Files like 'BF16/model-00001-of-00010.gguf' are grouped together.


The docstring example 'BF16/model-00001-of-00010.gguf' suggests that directory prefixes are included in the grouping, but the regex pattern only matches the shard suffix and replaces it. This means files like 'BF16/model-00001-of-00010.gguf' and 'BF16/model-00002-of-00010.gguf' would be grouped to 'BF16/model.gguf', which is correct. However, the comment "Single files like 'model-Q4_K_M.gguf' remain as-is" is accurate. The docstring is clear, but it would be helpful to add an example showing what the base name would be after grouping (e.g., 'BF16/model-00001-of-00010.gguf' → 'BF16/model.gguf').

Suggested change

Files like 'BF16/model-00001-of-00010.gguf' are grouped together.

Files like 'BF16/model-00001-of-00010.gguf' are grouped together.

For example, 'BF16/model-00001-of-00010.gguf' and

'BF16/model-00002-of-00010.gguf' are grouped under 'BF16/model.gguf'.

Copilot · 2026-01-07T04:35:27Z

src/hf_mem/print.py

+    gguf_files: Dict[str, int],
+    ignore_table_width: bool = False,
+) -> None:
+    """Print VRAM report for GGUF models.


The docstring says "Print VRAM report for GGUF models" but GGUF files are typically stored on disk and loaded into VRAM for inference. The function is actually printing file sizes (disk usage), not VRAM/memory usage. Consider changing "VRAM report" to "file size report" or "disk usage report" to more accurately reflect what the function does.

Suggested change

"""Print VRAM report for GGUF models.

"""Print file size report for GGUF model files.

Copilot · 2026-01-07T04:35:28Z

src/hf_mem/print.py

+    max_len = current_len + name_len - BORDERS_AND_PADDING
+    total_width = max_len + 12
+    text_len = len(text)
+    pad_left = (total_width - text_len) // 2
+    pad_right = total_width - text_len - pad_left


The width calculation appears inconsistent with other parts of the table. The formula current_len + name_len - BORDERS_AND_PADDING followed by + 12 doesn't align with how _print_header_gguf, _print_divider_gguf, and _print_row_gguf calculate their widths. This could cause misalignment where the centered text width doesn't match the actual table width defined by the header and dividers. Consider verifying the calculation matches the table structure created by the other GGUF printing functions.

Suggested change

max_len = current_len + name_len - BORDERS_AND_PADDING

total_width = max_len + 12

text_len = len(text)

pad_left = (total_width - text_len) // 2

pad_right = total_width - text_len - pad_left

# The centered line should have the same total width as GGUF rows:

# total_width = name_len + current_len + 6 (including outer borders).

# Since this function explicitly prints the two outer vertical borders,

# the inner content width (spaces + text between borders) is:

inner_width = name_len + current_len + 4

text_len = len(text)

# Ensure we don't use negative padding if the text is longer than the inner width.

total_padding = max(0, inner_width - text_len)

pad_left = total_padding // 2

pad_right = total_padding - pad_left

Copilot · 2026-01-07T04:35:28Z

src/hf_mem/print.py

+    Single files like 'model-Q4_K_M.gguf' remain as-is.
+    """
+    grouped: Dict[str, int] = {}
+    shard_pattern = re.compile(r"-\d{5}-of-\d{5}\.gguf$")


The regex pattern assumes sharded GGUF files always follow the format -\d{5}-of-\d{5}\.gguf$ (exactly 5 digits). However, some GGUF models might use different shard numbering formats (e.g., -00001-of-00002.gguf vs -0001-of-0002.gguf or -1-of-2.gguf). Consider using a more flexible pattern like -\d+-of-\d+\.gguf$ to handle various numbering schemes, or document the assumption that only 5-digit formats are supported.

Suggested change

shard_pattern = re.compile(r"-\d{5}-of-\d{5}\.gguf$")

shard_pattern = re.compile(r"-\d+-of-\d+\.gguf$")

Copilot · 2026-01-07T04:35:28Z

src/hf_mem/print.py

+        ignore_table_width: Whether to ignore max table width
+    """
+    grouped_files = _group_gguf_files(gguf_files)
+


The function can raise a ValueError if grouped_files is empty (when gguf_files is empty or all files are filtered out). The max() function will fail with an empty sequence. This should be handled before calling max(), either by checking if grouped_files is empty or by validating that gguf_files is not empty earlier in the flow.

Suggested change

if not grouped_files:

warnings.warn("No GGUF files found to report on; skipping VRAM report.")

return

Copilot · 2026-01-07T04:35:29Z

src/hf_mem/cli.py

+        else:
+            raise RuntimeError(
+                "NONE OF `model.safetensors`, `model.safetensors.index.json`, `model_index.json`, OR `.gguf` FILES HAVE BEEN FOUND"
+            )



The condition checking for GGUF files could be simplified. The dictionary comprehension already filters out files where size is None or path doesn't end with .gguf, so checking if gguf_files: is sufficient. The explicit else branch with the RuntimeError could be dedented to remove unnecessary nesting.

Suggested change

else:

raise RuntimeError(

"NONE OF `model.safetensors`, `model.safetensors.index.json`, `model_index.json`, OR `.gguf` FILES HAVE BEEN FOUND"

)

raise RuntimeError(

"NONE OF `model.safetensors`, `model.safetensors.index.json`, `model_index.json`, OR `.gguf` FILES HAVE BEEN FOUND"

)

Copilot · 2026-01-07T04:35:29Z

src/hf_mem/print.py

+            left, mid, right = BOX["lm"], BOX["mm"], BOX["rm"]
+
+    name_col_inner = name_len + 2
+    data_col_inner = current_len + 1


Inconsistent width calculation: The function is called with current_len + 1 (lines 311, 318, 320), but inside the function at line 239, it already adds 1 again with data_col_inner = current_len + 1. This means the data column width will be current_len + 2 in dividers, while _print_row_gguf uses just current_len (line 251). This inconsistency will cause table misalignment. Either remove the + 1 from the function calls or remove it from line 239, depending on which width is correct.

Suggested change

data_col_inner = current_len + 1

data_col_inner = current_len

Copilot · 2026-01-07T04:35:29Z

src/hf_mem/cli.py

+            return
+        else:
+            raise RuntimeError(
+                "NONE OF `model.safetensors`, `model.safetensors.index.json`, `model_index.json`, OR `.gguf` FILES HAVE BEEN FOUND"


The error message uses inconsistent formatting for file types. It mentions model.safetensors, model.safetensors.index.json, and model_index.json as specific filenames (in backticks), but then refers to .gguf FILES as a pattern/extension. For consistency, either list specific common GGUF filenames or refer to all items as patterns (e.g., ".safetensors", ".safetensors.index.json", "model_index.json", or "*.gguf").

Suggested change

"NONE OF `model.safetensors`, `model.safetensors.index.json`, `model_index.json`, OR `.gguf` FILES HAVE BEEN FOUND"

"NONE OF `*.safetensors`, `*.safetensors.index.json`, `model_index.json`, OR `*.gguf` FILES HAVE BEEN FOUND"

Copilot · 2026-01-07T04:35:30Z

src/hf_mem/cli.py

+            print_report_for_gguf(
+                model_id=model_id,
+                revision=revision,
+                gguf_files=gguf_files,
+                ignore_table_width=ignore_table_width,
+            )


When GGUF files are found and printed, the function returns early (line 202) without handling the json_output flag. This is inconsistent with the safetensors flow which respects the --json-output flag. Users expecting JSON output will get table output instead when using GGUF files. Consider adding JSON output support for GGUF files or documenting that JSON output is not supported for GGUF models.

Suggested change

print_report_for_gguf(

model_id=model_id,

revision=revision,

gguf_files=gguf_files,

ignore_table_width=ignore_table_width,

)

if json_output:

out = {

"model_id": model_id,

"revision": revision,

"gguf_files": gguf_files,

}

print(json.dumps(out))

else:

print_report_for_gguf(

model_id=model_id,

revision=revision,

gguf_files=gguf_files,

ignore_table_width=ignore_table_width,

)

alvarobartt

Thanks for the PR @vm7608! It's nice that for GGUF we can simply grab the required VRAM from the actual size of the file, but I'd rather format this differently e.g.

| <DEFAULT HEADER> |
| <DIVIDER> |
| <FILENAME> |
| <DTYPE> w/ <PARAMS> | <RAM BAR> |

I'm happy to take on another PR and add you as co-author, see the latest changes on https://github.com/alvarobartt/hf-mem/pulls/10 if you want to take a shot at it, otherwise I'm happy to do it myself!

Thanks again for the PR 🤗

P.S. I also think we could actually benefit from the current print_report function to parse the metadata in a way that's readable by that function, to ensure it follows the same formatting, but happy to hear suggestions!

- Added functionality to check for GGUF files in the CLI and print a report using `print_report_for_gguf`. - Updated error handling to include GGUF files in the search criteria. - Introduced new helper functions in `print.py` for formatting and displaying GGUF file reports, including grouping sharded files and adjusting table widths. - Updated function `_bytes_to_gb` with `use_decimal` argument to match with Huggingface file size.

- Updated `_print_header`, `_print_centered`, `_print_divider`, `_format_name`, and `_print_row` functions to include an optional `name_len` parameter for improved flexibility in formatting. - Removed redundant GGUF-specific print functions, consolidating functionality into existing print methods. - Adjusted the `print_report_for_gguf` function to utilize the refactored print methods, enhancing code maintainability.

vm7608 · 2026-01-08T04:43:03Z

Hi @alvarobartt ,

Thanks a lot for the feedback! I've just pushed a new commit to better leverage the existing print helper functions as you suggested.

However, I decided to keep print_report_for_gguf separate. My reasoning is that unlike Safetensors (where the output breaks down multiple components for the total VRAM), a GGUF file usually corresponds directly to a specific model version. Therefore, I felt that having a dedicated print function would be more appropriate/cleaner for this structure.

I'd be happy to hear any other suggestions you might have! Just a heads-up though, I'm a bit busy at the moment, so I might not be able to push updates immediately or continuously. 😅

Thanks again! 🚀

alvarobartt · 2026-02-20T17:46:33Z

Hey @vm7608 do you agree we close this PR in favour of #25, with proper attribution to your earlier work (which I think @diegovelilla has already done)? 🤗

Copilot AI review requested due to automatic review settings January 7, 2026 04:31

Copilot started reviewing on behalf of vm7608 January 7, 2026 04:32 View session

Copilot AI reviewed Jan 7, 2026

View reviewed changes

alvarobartt reviewed Jan 7, 2026

View reviewed changes

vm7608 added 2 commits January 7, 2026 22:20

vm7608 force-pushed the support-gguf branch from a5f30ed to 62e1275 Compare January 8, 2026 04:34

alvarobartt mentioned this pull request Jan 29, 2026

[FEATURE] Estimate VRAM for GGUF files #22

Closed

diegovelilla mentioned this pull request Jan 30, 2026

Add support for GGUF files + KV cache from GGUF metadata #25

Merged

2 tasks

vm7608 closed this Feb 22, 2026

	"""Print VRAM report for GGUF models.
	"""Print file size report for GGUF model files.

-    max_len = current_len + name_len - BORDERS_AND_PADDING
-    total_width = max_len + 12
-    text_len = len(text)
-    pad_left = (total_width - text_len) // 2
-    pad_right = total_width - text_len - pad_left
+    # The centered line should have the same total width as GGUF rows:
+    # total_width = name_len + current_len + 6 (including outer borders).
+    # Since this function explicitly prints the two outer vertical borders,
+    # the inner content width (spaces + text between borders) is:
+    inner_width = name_len + current_len + 4
+    text_len = len(text)
+    # Ensure we don't use negative padding if the text is longer than the inner width.
+    total_padding = max(0, inner_width - text_len)
+    pad_left = total_padding // 2
+    pad_right = total_padding - pad_left

	shard_pattern = re.compile(r"-\d{5}-of-\d{5}\.gguf$")
	shard_pattern = re.compile(r"-\d+-of-\d+\.gguf$")

+    if not grouped_files:
+        warnings.warn("No GGUF files found to report on; skipping VRAM report.")
+        return

	data_col_inner = current_len + 1
	data_col_inner = current_len

	"NONE OF `model.safetensors`, `model.safetensors.index.json`, `model_index.json`, OR `.gguf` FILES HAVE BEEN FOUND"
	"NONE OF `.safetensors`, `.safetensors.index.json`, `model_index.json`, OR `*.gguf` FILES HAVE BEEN FOUND"

Conversation

vm7608 commented Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

alvarobartt left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vm7608 commented Jan 8, 2026

Uh oh!

alvarobartt commented Feb 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

vm7608 commented Jan 7, 2026 •

edited

Loading

alvarobartt left a comment •

edited

Loading