-
Couldn't load subscription status.
- Fork 13.4k
convert-hf : reduce repeated boilerplate from write_tensors #7031
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
For example, loading `model-00001-of-00001.safetensors` now works. * convert-hf : fix stacking MoE expert tensors `torch.stack` and `torch.cat` don't do the same thing. * convert-hf : fix Mamba conversion Tested to work even with a SentencePiece-based tokenizer.
`os.listdir` is said to list files in arbitrary order. Sorting the file names should let "model-00009-of-00042.safetensors" be loaded before "model-00010-of-00042.safetensors".
| for tensor in reader.tensors: | ||
| # Dimensions are written in reverse order, so flip them first | ||
| shape = np.flipud(tensor.shape) | ||
| shape = np.flipud(tensor.shape).tolist() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any particular reason for coercing shape to list here (and not elsewhere)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's passed to GGUFWriter.add_tensor_info, which expects a Sequence[int] for the shape, and this shape is of type NDArray[uint32] which caused the error:
Argument of type "NDArray[uint32]" cannot be assigned to parameter "tensor_shape" of type "Sequence[int]" in function "add_tensor_info"
"NDArray[uint32]" is incompatible with "Sequence[int]"
(reportGeneralTypeIssues)
This comes from the shape field of a ReaderTensor, and it is coerced to list in other places, like in gguf-dump.py:
But the shape field of ReaderTensor is only used in 7 places (including its definition). In other places, the "shape" usually directly come from either Numpy or PyTorch, which use tuple[int, ...] for the shape type, which is compatible with Sequence[int].
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, although GGUFWriter.add_tensor_infos typing is then perhaps not correct I understand why it's simpler to just make it a list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GGUFWriter.add_tensor_info's typing seems correct to me; it's used in 2 other places, and both use shapes which are already compatible with Sequence[int].
So using Sequence[int] there seems appropriate, as it's the most general type (I think?) which can be indexed (it avoids having to cast tuple[int, ...] into list[int], or list[int] into tuple[int, ...]).
This is how the shape is used in add_tensor_info:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, all I'm saying is that that also works with NDArray[uint32] (even though it's not compatible with Sequence[int]).
|
An attempt to convert MPT fails early with an unhelpful error message, as shown below: |
|
Could be a bound method being treated as a static method. |
Not sure what that means in terms of fixing it "properly". The error disappears when I add an explicit init method iin MPTmodel which calls super().init with the same parameters. Then a different error occurs, which seems to be less related to Python and more to the actual conversion at hand: |
It seems Protocol can't be used as a statically type-checked ABC, because its subclasses also can't be instantiated. (why did it seem to work?) At least there's still a way to throw an error when forgetting to define the `model_arch` property of any registered Model subclasses.
|
I think this error happens because Seems like |
…tion There are no abstract methods used anyway, so using ABC isn't really necessary.
I can't recommend anything in that regard. But I checked that after your last commit the script works and produces exactly the same GGUF for MPT-7B as the master branch version (and the resulting model works). |
| # we don't need these | ||
| if name.endswith((".attention.masked_bias", ".attention.bias", ".attention.rotary_emb.inv_freq")): | ||
| if name.endswith((".attention.masked_bias", ".attention.bias", ".rotary_emb.inv_freq")): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want to blacklist tensors in a per model basis you can use
https://github.com/ggerganov/llama.cpp/blob/6ecf3189e00a1e8e737a78b6d10e1d7006e050a2/gguf-py/gguf/constants.py#L732-L743
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice idea! I didn't know about MODEL_TENSOR_SKIP. Right now it's only used for MODEL_ARCH.LLAMA in convert.py. The exclusion lists for the other architectures were never really tested (e.g. MODEL_ARCH.PLAMO and MODEL_ARCH.GROK both have MODEL_TENSOR.ATTN_ROT_EMBD in MODEL_TENSORS, but they are not in MODEL_TENSOR_SKIP, since they rely on matching the name like you pointed out).
I'm not sure about using this; modify_tensors can already be used to skip tensors, and I think it's cleaner this way for architecture-specific use, while common skips (as it is) belong in the base Model.write_tensors (the line on which this comment thread was started). BertModel has a nice example for an architecture-specific exclusion:
It was already done similarly in write_tensors (I tried to keep most of the model-specific code from write_tensors unchanged into modify_tensors). Otherwise, this means tensor types that won't be serialized should still get a serialized name, even if they're never used. BERT's pooling layers don't have an enum, and they probably should not have one if it's to be carried around but still remain unused like LLM_TENSOR_ATTN_ROT_EMBD in the llama.cpp file. (if they do get used, I agree they should get added, but they aren't for now.)
https://github.com/ggerganov/llama.cpp/blob/a2ac89d6efb41b535778bfeaecaae8fe295b6ed3/llama.cpp#L488
So instead, maybe it would be worth considering removing MODEL_TENSOR.ROPE_FREQS and MODEL_TENSOR.ATTN_ROT_EMBD? But this is likely out of the scope of this PR.
66ecf6c to
98f2d0e
Compare
|
@compilade BTW, I added your typing changes to #6778 (and a progress bar, inspired by your other PR), so will be a simpler rebase for whichever gets merged last. :) |
|
(For anyone following only this and not #7075) #7075 is starting to be more ready and I'm considering to unify both PRs to make the branches easier to manage. To keep the naming less confusing, I think it would be best to change the base branch of #7075 to However, doing this won't move over the review comments, so I'm wondering if that would still be fine? (@CISC, @Galunid) |
Fine by me, it was more for my benefit rather than this PR anyway. :) |
Sure, we can do that |
|
Closing in favor of #7075 |
A lot of the lines in
convert-hf-to-gguf.pyare repeated fromModel.write_tensors.With some refactoring, it is possible to remove 600+ lines while mostly keeping the original behavior.
Important
Continued at #7075. The testing section below is still valid for both PRs.
Changes Summary
pyrightconfig.jsonto let it load thegguflibrary correctly~=0.2.0from~=0.1.98encoding="utf8"toencoding="utf-8", for consistency, even if they are the same in Python.Modela Protocola plain class instead of anABC(Abstract Base Class)ABCdoesn't seem necessary anymoreModel.__init__model_archfield in subclasses exactly as it's already done, but without making pyright output type errors aboutLiteral[gguf.MODEL_ARCH.SOMETHING]not compatible with typepropertymodel_archfield, by checking for this in__init_subclass__(so it's checked even for unused subclasses ofModel)modify_tensorscallback which is used in the model classes instead of overridingwrite_tensors.data_torch, which should make lazy computation easier to implement in the future (instead of also having to wrap complexnumpyoperations)f16output is chosen, it's possible to selectively override the type of output tensors by overridingextra_f32_tensorsand/orextra_f16_tensorsin the architecture-specific model classesmodel-00001-of-00001.safetensorspart (previously, single-part models were expected to name their model filemodel.safetensors)v0-mamba-*models in https://huggingface.co/delphi-suiteps.einopstollama-python-extraNix packagesInternLM2Modelinconvert-hf-to-gguf.pyTesting
To make sure I didn't break previous work, I need help with testing conversion of existing architectures; I'm sure some of you already have the original model files downloaded, so testing conversion would be greatly appreciated.
$ python3 convert-hf-to-gguf.py --outtype f16 --outfile ./path/to/output.f16.gguf ./path/to/original/model/If your username is mentioned below, it's because I think you might have the relevant model nearby, and/or I'd like you to check whether or not I broke your previous work, because I did modify quite a bit of model-specific conversion code.
BloomModel(@xingchensong)MPTModel(@jploski)BaichuanModel(@foldl)FalconModel(@ggerganov)RefactModel(@ds5t5)PersimmonModel(@ggerganov)master. This model arch might need to be re-implemented from scratch since the graph code is quite complicatedStableLMModelwith StableLM-2-12B (@ashishdatta)LlamaModelwith MoE (@slaren)LlamaModelwithout MoE (@ggerganov)convert.pywith Llama models using a SentencePiece tokenizerGrokModel(@arki05)DbrxModel(@phymbert)Qwen2MoeModel(@ggerganov)GPT2Model(@ggerganov)InternLM2Model(@SolenoidWGT)GemmaModel(@ggerganov)MambaModel(@compilade)It would be nice to automate such testing in the future, though I'm not sure how feasible it is. It could require finding the smallest model for each architecture, though unfortunately some architecture have only been used for very large models.
Also, @mofosyne, note that this will cause (some) conflicts with #6511, though they shouldn't be too hard to resolve (since most of them will be caused by removed code). I would be fine with your PR getting merged first to handle conflicts here, though I don't know if you'd prefer to do it on your side instead.
Also pinging @cebtenzzre because of #5825 which was another effort at simplifying
convert-hf-to-gguf.py, so you should be familiar with the code.Out of scope
Anything related to BPE pre-tokenizer errors was not caused by this PR and won't be fixed here to reduce potential conflicts.