-
Notifications
You must be signed in to change notification settings - Fork 13.3k
granite embedding small support (ModernBert arch) #15641
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
ryan-mangeno
wants to merge
68
commits into
ggml-org:master
Choose a base branch
from
ryan-mangeno:modern-bert-support
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
68 commits
Select commit
Hold shift + click to select a range
6151592
constants and tensor mappings for modern bert support, model not supp…
ryan-mangeno 6643c5a
conversion now working, hf -> gguf
ryan-mangeno ac67fc6
working on support, now working on building graph
ryan-mangeno cc40378
some cleanup
ryan-mangeno 41b6864
cleanup
ryan-mangeno cc3d7ab
continuing
ryan-mangeno 4ceb828
correct tensor shape for qkv
ryan-mangeno 18c0c23
fixed tensor mappings and working on buildin graph
ryan-mangeno bffe3c9
tensor debugging now works -> (llama-eval-callback), instead of simul…
ryan-mangeno 8f32843
cleanup
ryan-mangeno 9805635
cleanup
ryan-mangeno 40249dd
cleanup
ryan-mangeno 853f344
more cleanup
ryan-mangeno 2a1c750
ubatch issues, the assert for checking equal seqs in llama-graph.cpp …
ryan-mangeno c73eb68
added cls token per previous modern bert attempt, still working on ch…
ryan-mangeno ca353d3
fixed pre tokenizer and still working through previous pr
ryan-mangeno 6d86944
working through previous attemp, implimented more accurate conversion…
ryan-mangeno 39c0291
fixed pre tokenizer
ryan-mangeno e101005
working on swa with local and global alternating attention
ryan-mangeno 044bc7d
some cleanup and now fails on build attn
ryan-mangeno e296a0b
starting to work, and some cleanup, currently failing on last layer c…
ryan-mangeno 2bacfb0
alternating rope implemented and modern bert graph build succeeds
ryan-mangeno 4e7c879
fixed asser for equal ubatch seq
ryan-mangeno 20d448a
cleanup
ryan-mangeno db4f565
added mask check in vocab
ryan-mangeno da0604a
fixed alternating rope, the hparams.rope_freq_base_train and hparams.…
ryan-mangeno 43a2980
reuse variable
ryan-mangeno e368442
fixed merge conflicts and added print debug check for swa type
ryan-mangeno 7036cc8
removed repeat
ryan-mangeno 2522ce8
merge fixes
ryan-mangeno e043815
Merge branch 'master' into modern-bert-support
ryan-mangeno 35667f2
Merge branch 'master' into modern-bert-support
ryan-mangeno 3cdd650
standard swa method can be used instead of a new enum being LLAMA_SWA…
ryan-mangeno 86adde6
merge
ryan-mangeno 46f2182
merge
ryan-mangeno 33eed31
correct swa layer indexing, is supposed to be 0, 3, 6 ... instead of …
ryan-mangeno 61a0b03
more modular hparam setting
ryan-mangeno 3bbf671
replaced attn out norm with ffn_norm and cosine similarity between hf…
ryan-mangeno f362878
merge
ryan-mangeno 3976d77
Update gguf-py/gguf/tensor_mapping.py
ryan-mangeno ff9f8c2
Update convert_hf_to_gguf_update.py
ryan-mangeno 97e1de4
Update src/llama-model.cpp
ryan-mangeno 4187cf5
Update src/llama-vocab.cpp
ryan-mangeno e3ac2ae
Update src/llama-model.cpp
ryan-mangeno 72f1f51
Update gguf-py/gguf/tensor_mapping.py
ryan-mangeno 952c302
Update convert_hf_to_gguf.py
ryan-mangeno 2ea2862
Update gguf-py/gguf/tensor_mapping.py
ryan-mangeno da3a1c9
Update gguf-py/gguf/tensor_mapping.py
ryan-mangeno 89431b6
Update convert_hf_to_gguf.py
ryan-mangeno 43332bf
Update gguf-py/gguf/tensor_mapping.py
ryan-mangeno b442b43
Update gguf-py/gguf/tensor_mapping.py
ryan-mangeno 94e7ece
Update gguf-py/gguf/tensor_mapping.py
ryan-mangeno 30fe2a7
Update gguf-py/gguf/tensor_mapping.py
ryan-mangeno c386eb0
Update gguf-py/gguf/tensor_mapping.py
ryan-mangeno 727008f
Update gguf-py/gguf/tensor_mapping.py
ryan-mangeno 93c1744
Update src/llama-graph.cpp
ryan-mangeno 7b956a3
Update src/llama-arch.cpp
ryan-mangeno 9b0f38b
Update src/llama-model.cpp
ryan-mangeno c9fa285
Update src/llama-model.cpp
ryan-mangeno e1abf73
Update src/llama-model.cpp
ryan-mangeno edbe4d2
Update src/llama-model.cpp
ryan-mangeno 1f54cf4
Update src/llama-model.cpp
ryan-mangeno 0082680
removed redundant hparam set
ryan-mangeno 9715c2a
enums for model sizes
ryan-mangeno 1d01245
conversion for modern-bert model supported rather than just granite-s…
ryan-mangeno a6306ce
Update src/llama-model.cpp
ryan-mangeno 3581b68
Update src/llama-model.cpp
ryan-mangeno 7c15ba5
fixed ordering of enum for freq_base_swa
ryan-mangeno File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -22,6 +22,7 @@ enum llm_arch { | |
LLM_ARCH_STARCODER, | ||
LLM_ARCH_REFACT, | ||
LLM_ARCH_BERT, | ||
LLM_ARCH_MODERN_BERT, | ||
LLM_ARCH_NOMIC_BERT, | ||
LLM_ARCH_NOMIC_BERT_MOE, | ||
LLM_ARCH_NEO_BERT, | ||
|
@@ -188,6 +189,7 @@ enum llm_kv { | |
LLM_KV_ROPE_DIMENSION_COUNT, | ||
LLM_KV_ROPE_DIMENSION_SECTIONS, | ||
LLM_KV_ROPE_FREQ_BASE, | ||
LLM_KV_ROPE_FREQ_BASE_SWA, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. NIT: Seems like this should be one line up so it's next to |
||
LLM_KV_ROPE_SCALE_LINEAR, | ||
LLM_KV_ROPE_SCALING_TYPE, | ||
LLM_KV_ROPE_SCALING_FACTOR, | ||
|
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You forgot to commit the mapping?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Originally I had made support for granite small embedding and it was using the modern arch under the hood