Skip to content

Commit 00781fc

Browse files
aluu317willmj
andauthored
docs: Update model architecture in README (foundation-model-stack#550)
* Update model architecture in README Signed-off-by: Angel Luu <[email protected]> * Update HF links for models Signed-off-by: Angel Luu <[email protected]> * Update comment for granite 4.0 support Co-authored-by: Will Johnson <[email protected]> Signed-off-by: Angel Luu <[email protected]> * Update formatting for table Signed-off-by: Angel Luu <[email protected]> * Update model archs Signed-off-by: Angel Luu <[email protected]> --------- Signed-off-by: Angel Luu <[email protected]> Signed-off-by: Angel Luu <[email protected]> Co-authored-by: Will Johnson <[email protected]>
1 parent b3744ca commit 00781fc

File tree

1 file changed

+23
-21
lines changed

1 file changed

+23
-21
lines changed

README.md

Lines changed: 23 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -328,35 +328,37 @@ Please refer to [this document](docs/offline-data-preprocessing.md) for details
328328

329329
Model Name & Size | Model Architecture | Full Finetuning | Low Rank Adaptation (i.e. LoRA) | qLoRA(quantized LoRA) |
330330
-------------------- | ---------------- | --------------- | ------------------------------- | --------------------- |
331-
Granite PowerLM 3B | GraniteForCausalLM | ✅* | ✅* | * |
332-
Granite 3.1 1B | GraniteForCausalLM | ✔️* | ✔️* | ✔️* |
333-
Granite 3.1 2B | GraniteForCausalLM | ✔️* | ✔️* | ✔️* |
334-
Granite 3.1 3B | GraniteForCausalLM | ✔️* | ✔️* | ✔️* |
335-
Granite 3.1 8B | GraniteForCausalLM | ✔️* | ✔️* | ✔️* |
336-
Granite 3.0 2B | GraniteForCausalLM | ✔️* | ✔️* | ✔️* |
337-
Granite 3.0 8B | GraniteForCausalLM | ✅* | ✅* | ✔️ |
338-
GraniteMoE 1B | GraniteMoeForCausalLM | ✅ | ✅** | ? |
339-
GraniteMoE 3B | GraniteMoeForCausalLM | ✅ | ✅** | ? |
340-
Granite 3B | LlamawithCausalLM | ✅ | ✔️ | ✔️ |
341-
Granite 8B | LlamawithCausalLM | ✅ | ✅ | ✅ |
331+
[Granite 4.0 Tiny Preview](https://huggingface.co/ibm-granite/granite-4.0-tiny-preview) | GraniteMoeHybridForCausalLM | ✅**** | ✅**** | ? |
332+
[Granite PowerLM 3B](https://huggingface.co/ibm-research/PowerLM-3b) | GraniteForCausalLM | * | * | * |
333+
[Granite 3.1 1B](https://huggingface.co/ibm-granite/granite-3.1-1b-a400m-base) | GraniteForCausalLM | ✔️* | ✔️* | ✔️* |
334+
[Granite 3.1 2B](https://huggingface.co/ibm-granite/granite-3.1-2b-base) | GraniteForCausalLM | ✔️* | ✔️* | ✔️* |
335+
[Granite 3.1 8B](https://huggingface.co/ibm-granite/granite-3.1-8b-base) | GraniteForCausalLM | ✔️* | ✔️* | ✔️* |
336+
[Granite 3.0 2B](https://huggingface.co/ibm-granite/granite-3.0-2b-base) | GraniteForCausalLM | ✔️* | ✔️* | ✔️* |
337+
[Granite 3.0 8B](https://huggingface.co/ibm-granite/granite-3.0-8b-base) | GraniteForCausalLM | ✅* | ✅* | ✔️ |
338+
[GraniteMoE 1B](https://huggingface.co/ibm-granite/granite-3.0-1b-a400m-base) | GraniteMoeForCausalLM | ✅ | ✅** | ? |
339+
[GraniteMoE 3B](https://huggingface.co/ibm-granite/granite-3.0-3b-a800m-base) | GraniteMoeForCausalLM | ✅ | ✅** | ? |
340+
[Granite 3B Code](https://huggingface.co/ibm-granite/granite-3b-code-base-2k) | LlamaForCausalLM | ✅ | ✔️ | ✔️ |
341+
[Granite 8B Code](https://huggingface.co/ibm-granite/granite-8b-code-base-4k) | LlamaForCausalLM | ✅ | ✅ | ✅ |
342342
Granite 13B | GPTBigCodeForCausalLM | ✅ | ✅ | ✔️ |
343343
Granite 20B | GPTBigCodeForCausalLM | ✅ | ✔️ | ✔️ |
344-
Granite 34B | GPTBigCodeForCausalLM | 🚫 | ✅ | ✅ |
345-
Llama3.1-8B | LLaMA 3.1 | ✅*** | ✔️ | ✔️ |  
346-
Llama3.1-70B(same architecture as llama3) | LLaMA 3.1 | 🚫 - same as Llama3-70B | ✔️ | ✔️ |
347-
Llama3.1-405B | LLaMA 3.1 | 🚫 | 🚫 | ✅ |
348-
Llama3-8B | LLaMA 3 | ✅ | ✅ | ✔️ |  
349-
Llama3-70B | LLaMA 3 | 🚫 | ✅ | ✅ |
344+
[Granite 34B Code](https://huggingface.co/ibm-granite/granite-34b-code-instruct-8k) | GPTBigCodeForCausalLM | 🚫 | ✅ | ✅ |
345+
[Llama3.1-8B](https://huggingface.co/meta-llama/Llama-3.1-8B) | LlamaForCausalLM | ✅*** | ✔️ | ✔️ |  
346+
[Llama3.1-70B](https://huggingface.co/meta-llama/Llama-3.1-70B)(same architecture as llama3) | LlamaForCausalLM | 🚫 - same as Llama3-70B | ✔️ | ✔️ |
347+
[Llama3.1-405B](https://huggingface.co/meta-llama/Llama-3.1-405B) | LlamaForCausalLM | 🚫 | 🚫 | ✅ |
348+
[Llama3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B) | LlamaForCausalLM | ✅ | ✅ | ✔️ |  
349+
[Llama3-70B](https://huggingface.co/meta-llama/Meta-Llama-3-70B) | LlamaForCausalLM | 🚫 | ✅ | ✅ |
350350
aLLaM-13b | LlamaForCausalLM |  ✅ | ✅ | ✅ |
351-
Mixtral 8x7B | Mixtral | ✅ | ✅ | ✅ |
352-
Mistral-7b | Mistral | ✅ | ✅ | ✅ |  
353-
Mistral large | Mistral | 🚫 | 🚫 | 🚫 |
351+
[Mixtral 8x7B](https://huggingface.co/mistralai/Mixtral-8x7B-v0.1) | MixtralForCausalLM | ✅ | ✅ | ✅ |
352+
[Mistral-7b](https://huggingface.co/mistralai/Mistral-7B-v0.1) | MistralForCausalLM | ✅ | ✅ | ✅ |  
353+
Mistral large | MistralForCausalLM | 🚫 | 🚫 | 🚫 |
354354

355355
(*) - Supported with `fms-hf-tuning` v2.4.0 or later.
356356

357357
(**) - Supported for q,k,v,o layers . `all-linear` target modules does not infer on vLLM yet.
358358

359-
(***) - Supported from platform up to 8k context length - same architecture as llama3-8b
359+
(***) - Supported from platform up to 8k context length - same architecture as llama3-8b.
360+
361+
(****) - Experimentally supported. Dependent on stable transformers version with PR [#37658](https://github.com/huggingface/transformers/pull/37658) and accelerate >= 1.3.0.
360362

361363
## Training
362364

0 commit comments

Comments
 (0)