Add GGUF file support in CLI and printing functions#8
Add GGUF file support in CLI and printing functions#8vm7608 wants to merge 2 commits intoalvarobartt:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This pull request adds support for GGUF model files to the CLI tool, extending its capability beyond safetensors models. The changes introduce new printing functions specifically designed for displaying GGUF file information in a formatted table, along with file grouping logic to handle sharded GGUF files.
Key changes:
- Added GGUF file detection and reporting in the CLI fallback path when no safetensors files are found
- Implemented new table formatting functions for GGUF files with dynamic column widths based on filename lengths
- Enhanced
_bytes_to_gbfunction with ause_decimalparameter to support both binary (GiB) and decimal (GB) conversions for HuggingFace compatibility
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 9 comments.
| File | Description |
|---|---|
| src/hf_mem/print.py | Added GGUF-specific table printing functions (_print_header_gguf, _print_centered_gguf, _print_divider_gguf, _print_row_gguf), file grouping logic for sharded GGUF files, and the main print_report_for_gguf function; updated _bytes_to_gb to support decimal conversion |
| src/hf_mem/cli.py | Added GGUF file detection in the fallback path, integrated print_report_for_gguf function call, and updated error message to include GGUF files |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| def _group_gguf_files(gguf_files: Dict[str, int]) -> Dict[str, int]: | ||
| """Group sharded GGUF files by model variant and sum their sizes. | ||
|
|
||
| Files like 'BF16/model-00001-of-00010.gguf' are grouped together. |
There was a problem hiding this comment.
The docstring example 'BF16/model-00001-of-00010.gguf' suggests that directory prefixes are included in the grouping, but the regex pattern only matches the shard suffix and replaces it. This means files like 'BF16/model-00001-of-00010.gguf' and 'BF16/model-00002-of-00010.gguf' would be grouped to 'BF16/model.gguf', which is correct. However, the comment "Single files like 'model-Q4_K_M.gguf' remain as-is" is accurate. The docstring is clear, but it would be helpful to add an example showing what the base name would be after grouping (e.g., 'BF16/model-00001-of-00010.gguf' → 'BF16/model.gguf').
| Files like 'BF16/model-00001-of-00010.gguf' are grouped together. | |
| Files like 'BF16/model-00001-of-00010.gguf' are grouped together. | |
| For example, 'BF16/model-00001-of-00010.gguf' and | |
| 'BF16/model-00002-of-00010.gguf' are grouped under 'BF16/model.gguf'. |
| gguf_files: Dict[str, int], | ||
| ignore_table_width: bool = False, | ||
| ) -> None: | ||
| """Print VRAM report for GGUF models. |
There was a problem hiding this comment.
The docstring says "Print VRAM report for GGUF models" but GGUF files are typically stored on disk and loaded into VRAM for inference. The function is actually printing file sizes (disk usage), not VRAM/memory usage. Consider changing "VRAM report" to "file size report" or "disk usage report" to more accurately reflect what the function does.
| """Print VRAM report for GGUF models. | |
| """Print file size report for GGUF model files. |
src/hf_mem/print.py
Outdated
| max_len = current_len + name_len - BORDERS_AND_PADDING | ||
| total_width = max_len + 12 | ||
| text_len = len(text) | ||
| pad_left = (total_width - text_len) // 2 | ||
| pad_right = total_width - text_len - pad_left |
There was a problem hiding this comment.
The width calculation appears inconsistent with other parts of the table. The formula current_len + name_len - BORDERS_AND_PADDING followed by + 12 doesn't align with how _print_header_gguf, _print_divider_gguf, and _print_row_gguf calculate their widths. This could cause misalignment where the centered text width doesn't match the actual table width defined by the header and dividers. Consider verifying the calculation matches the table structure created by the other GGUF printing functions.
| max_len = current_len + name_len - BORDERS_AND_PADDING | |
| total_width = max_len + 12 | |
| text_len = len(text) | |
| pad_left = (total_width - text_len) // 2 | |
| pad_right = total_width - text_len - pad_left | |
| # The centered line should have the same total width as GGUF rows: | |
| # total_width = name_len + current_len + 6 (including outer borders). | |
| # Since this function explicitly prints the two outer vertical borders, | |
| # the inner content width (spaces + text between borders) is: | |
| inner_width = name_len + current_len + 4 | |
| text_len = len(text) | |
| # Ensure we don't use negative padding if the text is longer than the inner width. | |
| total_padding = max(0, inner_width - text_len) | |
| pad_left = total_padding // 2 | |
| pad_right = total_padding - pad_left |
src/hf_mem/print.py
Outdated
| Single files like 'model-Q4_K_M.gguf' remain as-is. | ||
| """ | ||
| grouped: Dict[str, int] = {} | ||
| shard_pattern = re.compile(r"-\d{5}-of-\d{5}\.gguf$") |
There was a problem hiding this comment.
The regex pattern assumes sharded GGUF files always follow the format -\d{5}-of-\d{5}\.gguf$ (exactly 5 digits). However, some GGUF models might use different shard numbering formats (e.g., -00001-of-00002.gguf vs -0001-of-0002.gguf or -1-of-2.gguf). Consider using a more flexible pattern like -\d+-of-\d+\.gguf$ to handle various numbering schemes, or document the assumption that only 5-digit formats are supported.
| shard_pattern = re.compile(r"-\d{5}-of-\d{5}\.gguf$") | |
| shard_pattern = re.compile(r"-\d+-of-\d+\.gguf$") |
| ignore_table_width: Whether to ignore max table width | ||
| """ | ||
| grouped_files = _group_gguf_files(gguf_files) | ||
|
|
There was a problem hiding this comment.
The function can raise a ValueError if grouped_files is empty (when gguf_files is empty or all files are filtered out). The max() function will fail with an empty sequence. This should be handled before calling max(), either by checking if grouped_files is empty or by validating that gguf_files is not empty earlier in the flow.
| if not grouped_files: | |
| warnings.warn("No GGUF files found to report on; skipping VRAM report.") | |
| return |
| else: | ||
| raise RuntimeError( | ||
| "NONE OF `model.safetensors`, `model.safetensors.index.json`, `model_index.json`, OR `.gguf` FILES HAVE BEEN FOUND" | ||
| ) | ||
|
|
There was a problem hiding this comment.
The condition checking for GGUF files could be simplified. The dictionary comprehension already filters out files where size is None or path doesn't end with .gguf, so checking if gguf_files: is sufficient. The explicit else branch with the RuntimeError could be dedented to remove unnecessary nesting.
| else: | |
| raise RuntimeError( | |
| "NONE OF `model.safetensors`, `model.safetensors.index.json`, `model_index.json`, OR `.gguf` FILES HAVE BEEN FOUND" | |
| ) | |
| raise RuntimeError( | |
| "NONE OF `model.safetensors`, `model.safetensors.index.json`, `model_index.json`, OR `.gguf` FILES HAVE BEEN FOUND" | |
| ) |
src/hf_mem/print.py
Outdated
| left, mid, right = BOX["lm"], BOX["mm"], BOX["rm"] | ||
|
|
||
| name_col_inner = name_len + 2 | ||
| data_col_inner = current_len + 1 |
There was a problem hiding this comment.
Inconsistent width calculation: The function is called with current_len + 1 (lines 311, 318, 320), but inside the function at line 239, it already adds 1 again with data_col_inner = current_len + 1. This means the data column width will be current_len + 2 in dividers, while _print_row_gguf uses just current_len (line 251). This inconsistency will cause table misalignment. Either remove the + 1 from the function calls or remove it from line 239, depending on which width is correct.
| data_col_inner = current_len + 1 | |
| data_col_inner = current_len |
| return | ||
| else: | ||
| raise RuntimeError( | ||
| "NONE OF `model.safetensors`, `model.safetensors.index.json`, `model_index.json`, OR `.gguf` FILES HAVE BEEN FOUND" |
There was a problem hiding this comment.
The error message uses inconsistent formatting for file types. It mentions model.safetensors, model.safetensors.index.json, and model_index.json as specific filenames (in backticks), but then refers to .gguf FILES as a pattern/extension. For consistency, either list specific common GGUF filenames or refer to all items as patterns (e.g., ".safetensors", ".safetensors.index.json", "model_index.json", or "*.gguf").
| "NONE OF `model.safetensors`, `model.safetensors.index.json`, `model_index.json`, OR `.gguf` FILES HAVE BEEN FOUND" | |
| "NONE OF `*.safetensors`, `*.safetensors.index.json`, `model_index.json`, OR `*.gguf` FILES HAVE BEEN FOUND" |
| print_report_for_gguf( | ||
| model_id=model_id, | ||
| revision=revision, | ||
| gguf_files=gguf_files, | ||
| ignore_table_width=ignore_table_width, | ||
| ) |
There was a problem hiding this comment.
When GGUF files are found and printed, the function returns early (line 202) without handling the json_output flag. This is inconsistent with the safetensors flow which respects the --json-output flag. Users expecting JSON output will get table output instead when using GGUF files. Consider adding JSON output support for GGUF files or documenting that JSON output is not supported for GGUF models.
| print_report_for_gguf( | |
| model_id=model_id, | |
| revision=revision, | |
| gguf_files=gguf_files, | |
| ignore_table_width=ignore_table_width, | |
| ) | |
| if json_output: | |
| out = { | |
| "model_id": model_id, | |
| "revision": revision, | |
| "gguf_files": gguf_files, | |
| } | |
| print(json.dumps(out)) | |
| else: | |
| print_report_for_gguf( | |
| model_id=model_id, | |
| revision=revision, | |
| gguf_files=gguf_files, | |
| ignore_table_width=ignore_table_width, | |
| ) |
There was a problem hiding this comment.
Thanks for the PR @vm7608! It's nice that for GGUF we can simply grab the required VRAM from the actual size of the file, but I'd rather format this differently e.g.
| <DEFAULT HEADER> |
| <DIVIDER> |
| <FILENAME> |
| <DTYPE> w/ <PARAMS> | <RAM BAR> |
I'm happy to take on another PR and add you as co-author, see the latest changes on https://github.com/alvarobartt/hf-mem/pulls/10 if you want to take a shot at it, otherwise I'm happy to do it myself!
Thanks again for the PR 🤗
P.S. I also think we could actually benefit from the current print_report function to parse the metadata in a way that's readable by that function, to ensure it follows the same formatting, but happy to hear suggestions!
- Added functionality to check for GGUF files in the CLI and print a report using `print_report_for_gguf`. - Updated error handling to include GGUF files in the search criteria. - Introduced new helper functions in `print.py` for formatting and displaying GGUF file reports, including grouping sharded files and adjusting table widths. - Updated function `_bytes_to_gb` with `use_decimal` argument to match with Huggingface file size.
- Updated `_print_header`, `_print_centered`, `_print_divider`, `_format_name`, and `_print_row` functions to include an optional `name_len` parameter for improved flexibility in formatting. - Removed redundant GGUF-specific print functions, consolidating functionality into existing print methods. - Adjusted the `print_report_for_gguf` function to utilize the refactored print methods, enhancing code maintainability.
|
Hi @alvarobartt , Thanks a lot for the feedback! I've just pushed a new commit to better leverage the existing print helper functions as you suggested. However, I decided to keep I'd be happy to hear any other suggestions you might have! Just a heads-up though, I'm a bit busy at the moment, so I might not be able to push updates immediately or continuously. 😅 Thanks again! 🚀 |
|
Hey @vm7608 do you agree we close this PR in favour of #25, with proper attribution to your earlier work (which I think @diegovelilla has already done)? 🤗 |
Summary
print_report_for_gguf.print.pyfor formatting and displaying GGUF file reports, including grouping sharded files and adjusting table widths._bytes_to_gbwithuse_decimalargument to match with Huggingface file size.Example
hf-mem --model-id Qwen/Qwen3-VL-8B-Instruct-GGUF