Commit 8f1d81a
llama : support RWKV v6 models (ggml-org#8980)
* convert_hf_to_gguf: Add support for RWKV v6
Signed-off-by: Molly Sophia <[email protected]>
* Add RWKV tokenization
* Fix build
Signed-off-by: Molly Sophia <[email protected]>
* Do not use special tokens when matching in RWKV tokenizer
* Fix model loading
* Add (broken) placeholder graph builder for RWKV
* Add workaround for kv cache
* Add logits conversion to rwkv5
* Add rwkv5 layer norms
* Add time mix KVRG & correct merge mistake
* Add remaining time mix parameters
* Add time mix output loading
* Add placeholder llm_build_time_mix
* Fix build
Signed-off-by: Molly Sophia <[email protected]>
* Load more tensors for rwkv v6
Signed-off-by: Molly Sophia <[email protected]>
* Fix rwkv tokenizer
Signed-off-by: Molly Sophia <[email protected]>
* ggml: Add unary operator Exp
Signed-off-by: Molly Sophia <[email protected]>
* RWKV v6 graph building
Signed-off-by: Molly Sophia <[email protected]>
* Add ``rescale_every_n_layers`` parameter
Signed-off-by: Molly Sophia <[email protected]>
* Add ``wkv.head_size`` key for RWKV
so it doesn't reuse Mamba ssm parameters
Signed-off-by: Molly Sophia <[email protected]>
* Fix offloading layers to CUDA
Signed-off-by: Molly Sophia <[email protected]>
* Fix parallel inferencing for RWKV
Signed-off-by: Molly Sophia <[email protected]>
* Remove trailing whitespaces
Signed-off-by: Molly Sophia <[email protected]>
* build_rwkv: Avoid using inplace operations
Signed-off-by: Molly Sophia <[email protected]>
* convert_hf_to_gguf: rwkv: Avoid using ``eval``
Signed-off-by: Molly Sophia <[email protected]>
* convert_hf_to_gguf: rwkv tokenizer: Don't escape sequences manually
Signed-off-by: Molly Sophia <[email protected]>
* Update convert_hf_to_gguf.py
Co-authored-by: compilade <[email protected]>
* ggml: Add backward computation for unary op ``exp``
Signed-off-by: Molly Sophia <[email protected]>
* Update convert_hf_to_gguf.py
Co-authored-by: compilade <[email protected]>
* Update convert_hf_to_gguf.py
Co-authored-by: compilade <[email protected]>
* Use MODEL_ARCH.RWKV6 instead of MODEL_ARCH.RWKV
Signed-off-by: Molly Sophia <[email protected]>
* build_rwkv6: Simplify graph
Signed-off-by: Molly Sophia <[email protected]>
* llama: rwkv6: Detect model.type
Signed-off-by: Molly Sophia <[email protected]>
* llama: rwkv6: Fix tensor loading for 7B/14B models
Signed-off-by: Molly Sophia <[email protected]>
* llama: rwkv6: Fix group_norm assertion failure with Metal
Signed-off-by: Molly Sophia <[email protected]>
* llama: rwkv6: Clean up
Signed-off-by: Molly Sophia <[email protected]>
* llama: rwkv6: Add quantization tensor exclusion
Signed-off-by: Molly Sophia <[email protected]>
* llama: rwkv6: Use the new advanced batch splits
Signed-off-by: Molly Sophia <[email protected]>
* Update src/llama.cpp
Co-authored-by: compilade <[email protected]>
* llama: rwkv6: Use ``ggml_norm`` instead of ``ggml_group_norm``
Co-authored-by: compilade <[email protected]>
* llama: rwkv6: Apply code style and misc changes
Signed-off-by: Molly Sophia <[email protected]>
* converter: Use class name ``Rwkv6Model``
Signed-off-by: Molly Sophia <[email protected]>
* llama: rwkv6: Make use of key ``feed_forward_length``
Signed-off-by: Molly Sophia <[email protected]>
* llama: rwkv6: Add kv ``time_mix_extra_dim`` and ``time_decay_extra_dim``
Signed-off-by: Molly Sophia <[email protected]>
* converter: Match ``new_name`` instead of ``name`` for float32 explicit tensors
Signed-off-by: Molly Sophia <[email protected]>
* llama: rwkv6: Keep ``time_mix_w1/w2`` as F32
Signed-off-by: Molly Sophia <[email protected]>
* llama: rwkv6: Remove unused nodes
Signed-off-by: Molly Sophia <[email protected]>
* llama: rwkv6: Apply code format changes
Signed-off-by: Molly Sophia <[email protected]>
* llama: rwkv6: Add lora for some supported tensors
Currently att.key/receptance/value/gate/output, ffn.receptance/key/value, as well as head.weight
Signed-off-by: Molly Sophia <[email protected]>
* rwkv : speed-up tokenization using trie
* minor : style + indentation
* llama: rwkv6: Avoid division by zero
Co-authored-by: compilade <[email protected]>
* ggml: rwkv_wkv: Avoid copying the state
Signed-off-by: Molly Sophia <[email protected]>
---------
Signed-off-by: Molly Sophia <[email protected]>
Co-authored-by: Layl Bongers <[email protected]>
Co-authored-by: compilade <[email protected]>
Co-authored-by: Georgi Gerganov <[email protected]>1 parent a47667c commit 8f1d81a
File tree
9 files changed
+1266
-103
lines changed- ggml
- include
- src
- gguf-py/gguf
- include
- src
9 files changed
+1266
-103
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
3 | 3 | | |
4 | 4 | | |
5 | 5 | | |
| 6 | + | |
6 | 7 | | |
7 | 8 | | |
8 | 9 | | |
| |||
298 | 299 | | |
299 | 300 | | |
300 | 301 | | |
| 302 | + | |
| 303 | + | |
| 304 | + | |
301 | 305 | | |
302 | 306 | | |
303 | | - | |
| 307 | + | |
304 | 308 | | |
305 | 309 | | |
306 | 310 | | |
| |||
2716 | 2720 | | |
2717 | 2721 | | |
2718 | 2722 | | |
| 2723 | + | |
| 2724 | + | |
| 2725 | + | |
| 2726 | + | |
| 2727 | + | |
| 2728 | + | |
| 2729 | + | |
| 2730 | + | |
| 2731 | + | |
| 2732 | + | |
| 2733 | + | |
| 2734 | + | |
| 2735 | + | |
| 2736 | + | |
| 2737 | + | |
| 2738 | + | |
| 2739 | + | |
| 2740 | + | |
| 2741 | + | |
| 2742 | + | |
| 2743 | + | |
| 2744 | + | |
| 2745 | + | |
| 2746 | + | |
| 2747 | + | |
| 2748 | + | |
| 2749 | + | |
| 2750 | + | |
| 2751 | + | |
| 2752 | + | |
| 2753 | + | |
| 2754 | + | |
| 2755 | + | |
| 2756 | + | |
| 2757 | + | |
| 2758 | + | |
| 2759 | + | |
| 2760 | + | |
| 2761 | + | |
| 2762 | + | |
| 2763 | + | |
| 2764 | + | |
| 2765 | + | |
| 2766 | + | |
| 2767 | + | |
| 2768 | + | |
| 2769 | + | |
| 2770 | + | |
| 2771 | + | |
| 2772 | + | |
| 2773 | + | |
| 2774 | + | |
| 2775 | + | |
| 2776 | + | |
| 2777 | + | |
| 2778 | + | |
| 2779 | + | |
| 2780 | + | |
| 2781 | + | |
| 2782 | + | |
| 2783 | + | |
| 2784 | + | |
| 2785 | + | |
| 2786 | + | |
| 2787 | + | |
| 2788 | + | |
| 2789 | + | |
| 2790 | + | |
| 2791 | + | |
| 2792 | + | |
| 2793 | + | |
| 2794 | + | |
| 2795 | + | |
| 2796 | + | |
| 2797 | + | |
| 2798 | + | |
| 2799 | + | |
| 2800 | + | |
2719 | 2801 | | |
2720 | 2802 | | |
2721 | 2803 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
514 | 514 | | |
515 | 515 | | |
516 | 516 | | |
| 517 | + | |
517 | 518 | | |
518 | 519 | | |
519 | 520 | | |
| |||
548 | 549 | | |
549 | 550 | | |
550 | 551 | | |
| 552 | + | |
551 | 553 | | |
552 | 554 | | |
553 | 555 | | |
| |||
1165 | 1167 | | |
1166 | 1168 | | |
1167 | 1169 | | |
| 1170 | + | |
| 1171 | + | |
| 1172 | + | |
| 1173 | + | |
| 1174 | + | |
| 1175 | + | |
| 1176 | + | |
| 1177 | + | |
1168 | 1178 | | |
1169 | 1179 | | |
1170 | 1180 | | |
| |||
1913 | 1923 | | |
1914 | 1924 | | |
1915 | 1925 | | |
| 1926 | + | |
| 1927 | + | |
| 1928 | + | |
| 1929 | + | |
| 1930 | + | |
| 1931 | + | |
| 1932 | + | |
| 1933 | + | |
| 1934 | + | |
1916 | 1935 | | |
1917 | 1936 | | |
1918 | 1937 | | |
| |||
0 commit comments