-
Notifications
You must be signed in to change notification settings - Fork 13.9k
Fix parameter loading in PLaMo2 #16075
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 5 commits
Commits
Show all changes
9 commits
Select commit
Hold shift + click to select a range
433782b
Fix to use hidden_size_per_head
mitmul 4fe5c96
Fix num heads
mitmul b164aa1
Fix array
mitmul 7f5d8a9
Fix loading weights
mitmul 10af12c
Support old GGUF converted by the previous version of llama.cpp
mitmul 0e8aff1
Update src/llama-model.cpp
mitmul 90fbf6a
Merge remote-tracking branch 'upstream/master' into mitmul/fix-plamo2
mitmul 07b55f4
Move shared parameter definitions to the outside of loop
mitmul 1be2787
Not calculating n_embd_head_k,v by n_embd / n_head
mitmul File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Already done here:
llama.cpp/src/llama-model.cpp
Lines 555 to 580 in 10af12c
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@CISC Ah, I applied your suggestion but noticed that the problem comes from the following two lines:
llama.cpp/src/llama-model.cpp
Line 560 in 10af12c
llama.cpp/src/llama-model.cpp
Line 563 in 10af12c
The main purpose of this PR is to support some variations of PLaMo2 model created by pruning larger models, that has its
n_embed_head_klarger thann_embd / n_head.So let me roll back this changes to support such cases in a variant of PLaMo2 models.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But they would have to have the metadata present to work then, so no issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mitmul gentle ping
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@CISC Sorry for my late reaction. Um, could you explain what you meant with this?
For example, pfnet/plamo-2.1-2b-cpt has
hidden_size= 2048 (forn_embd)hidden_size_per_head= 128 (forn_embd_head_kandn_embd_head_v)num_key_value_heads= 32 (forn_head_arrinil% 2 = 0 layers (attention layers, not mamba layers)),so that
n_embd_head_k/v!=n_embd/n_head.Then, with the current
load_hparams()it goes through theelseblock here:llama.cpp/src/llama-model.cpp
Lines 577 to 581 in 132d673
because
hparams.n_head()will return 0 that comes from the first element ofn_head_arrindicating mamba layers.So, I thought we need to call the following functions to set
hparams.n_embd_head_k/vwith the right values comes fromhidden_size_per_headin config.json:llama.cpp/src/llama-model.cpp
Lines 1082 to 1083 in 1be2787
That's why I sent this PR, but if I wrongly understand something around model loading, please let me know it and I'd appreciate it if you would give me some advices about how we can load such variants like pfnet/plamo-2.1-2b-cpt.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aha, that makes sense then, thank you for the explanation. :)