@@ -328,35 +328,37 @@ Please refer to [this document](docs/offline-data-preprocessing.md) for details
328328
329329Model Name & Size | Model Architecture | Full Finetuning | Low Rank Adaptation (i.e. LoRA) | qLoRA(quantized LoRA) |
330330-------------------- | ---------------- | --------------- | ------------------------------- | --------------------- |
331- Granite PowerLM 3B | GraniteForCausalLM | ✅* | ✅* | ✅ * |
332- Granite 3.1 1B | GraniteForCausalLM | ✔️ * | ✔️ * | ✔️ * |
333- Granite 3.1 2B | GraniteForCausalLM | ✔️* | ✔️* | ✔️* |
334- Granite 3.1 3B | GraniteForCausalLM | ✔️* | ✔️* | ✔️* |
335- Granite 3.1 8B | GraniteForCausalLM | ✔️* | ✔️* | ✔️* |
336- Granite 3.0 2B | GraniteForCausalLM | ✔️* | ✔️* | ✔️* |
337- Granite 3.0 8B | GraniteForCausalLM | ✅* | ✅* | ✔️ |
338- GraniteMoE 1B | GraniteMoeForCausalLM | ✅ | ✅** | ? |
339- GraniteMoE 3B | GraniteMoeForCausalLM | ✅ | ✅** | ? |
340- Granite 3B | LlamawithCausalLM | ✅ | ✔️ | ✔️ |
341- Granite 8B | LlamawithCausalLM | ✅ | ✅ | ✅ |
331+ [ Granite 4.0 Tiny Preview ] ( https://huggingface.co/ibm-granite/granite-4.0-tiny-preview ) | GraniteMoeHybridForCausalLM | ✅**** | ✅**** | ? |
332+ [ Granite PowerLM 3B ] ( https://huggingface.co/ibm-research/PowerLM-3b ) | GraniteForCausalLM | ✅ * | ✅ * | ✅ * |
333+ [ Granite 3.1 1B ] ( https://huggingface.co/ibm-granite/granite-3.1-1b-a400m-base ) | GraniteForCausalLM | ✔️* | ✔️* | ✔️* |
334+ [ Granite 3.1 2B ] ( https://huggingface.co/ibm-granite/granite-3.1-2b-base ) | GraniteForCausalLM | ✔️* | ✔️* | ✔️* |
335+ [ Granite 3.1 8B] ( https://huggingface.co/ibm-granite/granite-3.1-8b-base ) | GraniteForCausalLM | ✔️* | ✔️* | ✔️* |
336+ [ Granite 3.0 2B] ( https://huggingface.co/ibm-granite/granite-3.0-2b-base ) | GraniteForCausalLM | ✔️* | ✔️* | ✔️* |
337+ [ Granite 3.0 8B] ( https://huggingface.co/ibm-granite/granite-3.0-8b-base ) | GraniteForCausalLM | ✅* | ✅* | ✔️ |
338+ [ GraniteMoE 1B] ( https://huggingface.co/ibm-granite/granite-3.0-1b-a400m-base ) | GraniteMoeForCausalLM | ✅ | ✅** | ? |
339+ [ GraniteMoE 3B] ( https://huggingface.co/ibm-granite/granite-3.0-3b-a800m-base ) | GraniteMoeForCausalLM | ✅ | ✅** | ? |
340+ [ Granite 3B Code ] ( https://huggingface.co/ibm-granite/granite-3b-code-base-2k ) | LlamaForCausalLM | ✅ | ✔️ | ✔️ |
341+ [ Granite 8B Code ] ( https://huggingface.co/ibm-granite/granite-8b-code-base-4k ) | LlamaForCausalLM | ✅ | ✅ | ✅ |
342342Granite 13B | GPTBigCodeForCausalLM | ✅ | ✅ | ✔️ |
343343Granite 20B | GPTBigCodeForCausalLM | ✅ | ✔️ | ✔️ |
344- Granite 34B | GPTBigCodeForCausalLM | 🚫 | ✅ | ✅ |
345- Llama3.1-8B | LLaMA 3.1 | ✅*** | ✔️ | ✔️ |
346- Llama3.1-70B( same architecture as llama3) | LLaMA 3.1 | 🚫 - same as Llama3-70B | ✔️ | ✔️ |
347- Llama3.1-405B | LLaMA 3.1 | 🚫 | 🚫 | ✅ |
348- Llama3-8B | LLaMA 3 | ✅ | ✅ | ✔️ |
349- Llama3-70B | LLaMA 3 | 🚫 | ✅ | ✅ |
344+ [ Granite 34B Code ] ( https://huggingface.co/ibm-granite/granite-34b-code-instruct-8k ) | GPTBigCodeForCausalLM | 🚫 | ✅ | ✅ |
345+ [ Llama3.1-8B] ( https://huggingface.co/meta-llama/Llama-3.1-8B ) | LlamaForCausalLM | ✅*** | ✔️ | ✔️ |
346+ [ Llama3.1-70B] ( https://huggingface.co/meta-llama/Llama-3.1-70B ) ( same architecture as llama3) | LlamaForCausalLM | 🚫 - same as Llama3-70B | ✔️ | ✔️ |
347+ [ Llama3.1-405B] ( https://huggingface.co/meta-llama/Llama-3.1-405B ) | LlamaForCausalLM | 🚫 | 🚫 | ✅ |
348+ [ Llama3-8B] ( https://huggingface.co/meta-llama/Meta-Llama-3-8B ) | LlamaForCausalLM | ✅ | ✅ | ✔️ |
349+ [ Llama3-70B] ( https://huggingface.co/meta-llama/Meta-Llama-3-70B ) | LlamaForCausalLM | 🚫 | ✅ | ✅ |
350350aLLaM-13b | LlamaForCausalLM | ✅ | ✅ | ✅ |
351- Mixtral 8x7B | Mixtral | ✅ | ✅ | ✅ |
352- Mistral-7b | Mistral | ✅ | ✅ | ✅ |
353- Mistral large | Mistral | 🚫 | 🚫 | 🚫 |
351+ [ Mixtral 8x7B] ( https://huggingface.co/mistralai/Mixtral-8x7B-v0.1 ) | MixtralForCausalLM | ✅ | ✅ | ✅ |
352+ [ Mistral-7b] ( https://huggingface.co/mistralai/Mistral-7B-v0.1 ) | MistralForCausalLM | ✅ | ✅ | ✅ |
353+ Mistral large | MistralForCausalLM | 🚫 | 🚫 | 🚫 |
354354
355355(* ) - Supported with ` fms-hf-tuning ` v2.4.0 or later.
356356
357357(** ) - Supported for q,k,v,o layers . ` all-linear ` target modules does not infer on vLLM yet.
358358
359- (*** ) - Supported from platform up to 8k context length - same architecture as llama3-8b
359+ (*** ) - Supported from platform up to 8k context length - same architecture as llama3-8b.
360+
361+ (**** ) - Experimentally supported. Dependent on stable transformers version with PR [ #37658 ] ( https://github.com/huggingface/transformers/pull/37658 ) and accelerate >= 1.3.0.
360362
361363## Training
362364
0 commit comments