docs: Update model architecture in README (foundation-model-stack#550)

aluu317 · willmj · web-flow · commit 00781fc674c5 · 2025-05-20T10:08:11.000-06:00
* Update model architecture in README

Signed-off-by: Angel Luu &lt;angel.luu@us.ibm.com&gt;

* Update HF links for models

Signed-off-by: Angel Luu &lt;angel.luu@us.ibm.com&gt;

* Update comment for granite 4.0 support

Co-authored-by: Will Johnson &lt;mwjohnson728@gmail.com&gt;
Signed-off-by: Angel Luu &lt;an317gel@gmail.com&gt;

* Update formatting for table

Signed-off-by: Angel Luu &lt;angel.luu@us.ibm.com&gt;

* Update model archs

Signed-off-by: Angel Luu &lt;angel.luu@us.ibm.com&gt;

---------

Signed-off-by: Angel Luu &lt;angel.luu@us.ibm.com&gt;
Signed-off-by: Angel Luu &lt;an317gel@gmail.com&gt;
Co-authored-by: Will Johnson &lt;mwjohnson728@gmail.com&gt;
diff --git a/README.md b/README.md
@@ -328,35 +328,37 @@ Please refer to [this document](docs/offline-data-preprocessing.md) for details
 
 Model Name & Size  | Model Architecture | Full Finetuning | Low Rank Adaptation (i.e. LoRA) | qLoRA(quantized LoRA) | 
 -------------------- | ---------------- | --------------- | ------------------------------- | --------------------- |
-Granite PowerLM 3B   | GraniteForCausalLM | ✅* | ✅* | ✅* |
-Granite 3.1 1B       | GraniteForCausalLM | ✔️* | ✔️* | ✔️* |
-Granite 3.1 2B       | GraniteForCausalLM | ✔️* | ✔️* | ✔️* |
-Granite 3.1 3B       | GraniteForCausalLM | ✔️* | ✔️* | ✔️* |
-Granite 3.1 8B       | GraniteForCausalLM | ✔️* | ✔️* | ✔️* |
-Granite 3.0 2B       | GraniteForCausalLM | ✔️* | ✔️* | ✔️* |
-Granite 3.0 8B       | GraniteForCausalLM | ✅* | ✅* | ✔️ |
-GraniteMoE 1B        | GraniteMoeForCausalLM  | ✅ | ✅** | ? |
-GraniteMoE 3B        | GraniteMoeForCausalLM  | ✅ | ✅** | ? |
-Granite 3B           | LlamawithCausalLM      | ✅ | ✔️  | ✔️ | 
-Granite 8B           | LlamawithCausalLM      | ✅ | ✅ | ✅ |
+[Granite 4.0 Tiny Preview](https://huggingface.co/ibm-granite/granite-4.0-tiny-preview) | GraniteMoeHybridForCausalLM | ✅**** | ✅**** | ? |
+[Granite PowerLM 3B](https://huggingface.co/ibm-research/PowerLM-3b) | GraniteForCausalLM | ✅* | ✅* | ✅* |
+[Granite 3.1 1B](https://huggingface.co/ibm-granite/granite-3.1-1b-a400m-base)       | GraniteForCausalLM | ✔️* | ✔️* | ✔️* |
+[Granite 3.1 2B](https://huggingface.co/ibm-granite/granite-3.1-2b-base)             | GraniteForCausalLM | ✔️* | ✔️* | ✔️* |
+[Granite 3.1 8B](https://huggingface.co/ibm-granite/granite-3.1-8b-base)       | GraniteForCausalLM | ✔️* | ✔️* | ✔️* |
+[Granite 3.0 2B](https://huggingface.co/ibm-granite/granite-3.0-2b-base)       | GraniteForCausalLM | ✔️* | ✔️* | ✔️* |
+[Granite 3.0 8B](https://huggingface.co/ibm-granite/granite-3.0-8b-base)       | GraniteForCausalLM | ✅* | ✅* | ✔️ |
+[GraniteMoE 1B](https://huggingface.co/ibm-granite/granite-3.0-1b-a400m-base)        | GraniteMoeForCausalLM  | ✅ | ✅** | ? |
+[GraniteMoE 3B](https://huggingface.co/ibm-granite/granite-3.0-3b-a800m-base)        | GraniteMoeForCausalLM  | ✅ | ✅** | ? |
+[Granite 3B Code](https://huggingface.co/ibm-granite/granite-3b-code-base-2k)           | LlamaForCausalLM      | ✅ | ✔️  | ✔️ | 
+[Granite 8B Code](https://huggingface.co/ibm-granite/granite-8b-code-base-4k)           | LlamaForCausalLM      | ✅ | ✅ | ✅ |
 Granite 13B          | GPTBigCodeForCausalLM  | ✅ | ✅ | ✔️  | 
 Granite 20B          | GPTBigCodeForCausalLM  | ✅ | ✔️  | ✔️  | 
-Granite 34B          | GPTBigCodeForCausalLM  | 🚫 | ✅ | ✅ | 
-Llama3.1-8B          | LLaMA 3.1              | ✅*** | ✔️ | ✔️ |  
-Llama3.1-70B(same architecture as llama3) | LLaMA 3.1 | 🚫 - same as Llama3-70B | ✔️  | ✔️ | 
-Llama3.1-405B                             | LLaMA 3.1 | 🚫 | 🚫 | ✅ | 
-Llama3-8B                                 | LLaMA 3   | ✅ | ✅ | ✔️ |  
-Llama3-70B                                | LLaMA 3   | 🚫 | ✅ | ✅ |
+[Granite 34B Code](https://huggingface.co/ibm-granite/granite-34b-code-instruct-8k)            | GPTBigCodeForCausalLM  | 🚫 | ✅ | ✅ | 
+[Llama3.1-8B](https://huggingface.co/meta-llama/Llama-3.1-8B)          | LlamaForCausalLM               | ✅*** | ✔️ | ✔️ |  
+[Llama3.1-70B](https://huggingface.co/meta-llama/Llama-3.1-70B)(same architecture as llama3) | LlamaForCausalLM   | 🚫 - same as Llama3-70B | ✔️  | ✔️ | 
+[Llama3.1-405B](https://huggingface.co/meta-llama/Llama-3.1-405B)                            | LlamaForCausalLM   | 🚫 | 🚫 | ✅ | 
+[Llama3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B)                               | LlamaForCausalLM   | ✅ | ✅ | ✔️ |  
+[Llama3-70B](https://huggingface.co/meta-llama/Meta-Llama-3-70B)                             | LlamaForCausalLM   | 🚫 | ✅ | ✅ |
 aLLaM-13b                                 | LlamaForCausalLM |  ✅ | ✅ | ✅ |
-Mixtral 8x7B                              | Mixtral   | ✅ | ✅ | ✅ |
-Mistral-7b                                | Mistral   | ✅ | ✅ | ✅ |  
-Mistral large                             | Mistral   | 🚫 | 🚫 | 🚫 | 
+[Mixtral 8x7B](https://huggingface.co/mistralai/Mixtral-8x7B-v0.1)                              | MixtralForCausalLM   | ✅ | ✅ | ✅ |
+[Mistral-7b](https://huggingface.co/mistralai/Mistral-7B-v0.1)                                  | MistralForCausalLM   | ✅ | ✅ | ✅ |  
+Mistral large                             | MistralForCausalLM   | 🚫 | 🚫 | 🚫 | 
 
 (*) - Supported with `fms-hf-tuning` v2.4.0 or later.
 
 (**) - Supported for q,k,v,o layers . `all-linear` target modules does not infer on vLLM yet.
 
-(***) - Supported from platform up to 8k context length - same architecture as llama3-8b
+(***) - Supported from platform up to 8k context length - same architecture as llama3-8b.
+
+(****) - Experimentally supported. Dependent on stable transformers version with PR [#37658](https://github.com/huggingface/transformers/pull/37658) and accelerate >= 1.3.0.
 
 ## Training