Add support for GGUF files + KV cache from GGUF metadata#25
Add support for GGUF files + KV cache from GGUF metadata#25alvarobartt merged 24 commits intoalvarobartt:mainfrom
Conversation
There was a problem hiding this comment.
I added this as a mapping since following the match/case approach I wouldn't be able to reuse any cases. All conversions have been taken either from the official HF docs or from the type declarations in the official ggml library.
It might be interesting to merge both dtype-to-bytes-per-weight functions or at least standardize them since rn one returns int and the other float.
|
Cool @diegovelilla, do you think we could at least temporarily move the changes to Warning Not to tackle in this PR, but sharing for visibility on my short term plans) In the meantime I might think of a potential refactor to ease the things as in adding other formats so that we have a structure in the repository that contains a dedicated file for the CLI, then another for the |
|
In this last commit I added the following:
I am not quite sure how the "components" layer of the json-output is supposed to work since most of the times it seems to default to "Transformer". As of now this code only uses 1 component with the name "Transformer" too. Also, the _read_xxx functions could be condensed into one read function that also takes a str for the type and the number of bytes like _read(raw_metadata, offset, "I", 4) for reading a uint_32. I wasn't sure how would be better so I ran with different functions, however this can be easily reduced with that more general function. As of now I tested with a couple models and it works. Only things left would be to add the kv-cache support, merge it with the code in |
7f2bdb5 to
a1c496e
Compare
|
Looking good? @alvarobartt
Last up is adding the KV-Cache printing + optimizing the fetch of multiple files. |
|
That's great @diegovelilla, looks neat! What do you think if we show the table with a simple per-file listing without the details, and then add an arg in the CLI as I'm not a super active GGUF user, so let me know otherwise. |
src/hf_mem/cli.py
Outdated
| # TODO: `recursive=true` shouldn't really be required unless it's a Diffusers | ||
| # models... I don't think this adds extra latency anyway | ||
| # NOTE: `recursive=true` is also need for GGUF file directories where each | ||
| # sharded quantization is inside a different folder like: Q2_K/model_Q2_K-0001-of-0048.gguf |
There was a problem hiding this comment.
IMO we can remove both comments, no longer required!
| # TODO: `recursive=true` shouldn't really be required unless it's a Diffusers | |
| # models... I don't think this adds extra latency anyway | |
| # NOTE: `recursive=true` is also need for GGUF file directories where each | |
| # sharded quantization is inside a different folder like: Q2_K/model_Q2_K-0001-of-0048.gguf |
d3436cb to
a746e7f
Compare
|
Hey @diegovelilla thanks a lot for the effort! It does make sense, feel free to completely remove the |
|
Hey @alvarobartt should we then merge this PR as it is and create a new issue for the |
alvarobartt
left a comment
There was a problem hiding this comment.
Hey again @diegovelilla, apologies for the delay, this is great!
Q: Do you think the --gguf flag is required? How common are repositories with both Safetensors and GGUF files? Can't we just skip the --gguf flag in favour of just checking which of those files is present in file_paths when listing the files in the repository? Then for GGUF files within a Safetensors repository, I'd just warn the user that if they'd like to run the estimate for those they should provide --gguf-file showing them the possible GGUF files in there, thoughts?
Thanks again in advance, this feature is going to be much appreciated by the community 🤗
|
Hey @alvarobartt, now it should work without the Now GGUF logic only applies if:
In the case of parsing a repo that contains both, a warning is triggered, reminding that if they want to estimate any GGUF file, they have to set the Edit: Branch rebased over current main (Feb 20th). |
1f018c6 to
af2529c
Compare
README.md
Outdated
|
|
||
| ## GGUF Files | ||
|
|
||
| By enabling the `--gguf` flag, you can estimate memory requirements for *.gguf* files. All files will be listed with their corresponding memory estimations. For a more in depth report like the one used for *.safetensors* files (with information regarding weight dtypes) the flag `--gguf-file` can be used to estimate a single GGUF model. For sharded files, the path to any of the individual shards will work. |
There was a problem hiding this comment.
Given that you recently removed the --gguf flag, should we update this blob here?
There was a problem hiding this comment.
Yes, it is also missing the screenshot since all have been taken from what I assume is your terminal.
- Added functionality to check for GGUF files in the CLI and print a report using `print_report_for_gguf`. - Updated error handling to include GGUF files in the search criteria. - Introduced new helper functions in `print.py` for formatting and displaying GGUF file reports, including grouping sharded files and adjusting table widths. - Updated function `_bytes_to_gb` with `use_decimal` argument to match with Huggingface file size.
- Updated `_print_header`, `_print_centered`, `_print_divider`, `_format_name`, and `_print_row` functions to include an optional `name_len` parameter for improved flexibility in formatting. - Removed redundant GGUF-specific print functions, consolidating functionality into existing print methods. - Adjusted the `print_report_for_gguf` function to utilize the refactored print methods, enhancing code maintainability.
a42b0c2 to
9fa7a43
Compare
|
Hey @alvarobartt, already changed the README, rebased over the last changes regarding version printing and added it to the gguf logic. Also now it shouldn't fail on precommit checks (mb). Just missing the screenshot from the |
Awesome @diegovelilla! Here you go (I've included the
Also feel free to position the GGUF section in the README.md on top of the Anthropic Skills entry instead of below 🤗 |
|
Should be done @alvarobartt |
alvarobartt
left a comment
There was a problem hiding this comment.
Thanks a lot for the effort and the patience @diegovelilla 🤗
I'll merge as-is, and then likely push a couple more commits on top before releasing, but ideally trying to release mid next-week!







Description
This PR continues on top of the work of @vm7608 in #8. It aims to add support for GGUF files by:
--ggufflag to separate .safetensors from .gguf estimations.(I'm starting it as a draft pull request to show the progress and explain decisions along)
CONTRIBUTING.md.