Skip to content

Commit 1e82e02

Browse files
authored
Merge pull request #391 from foundation-model-stack/v2.1.2-rc1
chore(release): merge set of changes for v2.1.2
2 parents e2ac091 + 6e8139c commit 1e82e02

File tree

2 files changed

+42
-2
lines changed

2 files changed

+42
-2
lines changed

README.md

Lines changed: 41 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -132,7 +132,47 @@ Example: Train.jsonl
132132

133133
## Supported Models
134134

135-
Current supported and tested models are `Llama3` (8B configuration has been tested) and `GPTBigCode`.
135+
- For each tuning technique, we run testing on a single large model of each architecture type and claim support for the smaller models. For example, with QLoRA technique, we tested on granite-34b GPTBigCode and claim support for granite-20b-multilingual.
136+
137+
- LoRA Layers supported : All the linear layers of a model + output `lm_head` layer. Users can specify layers as a list or use `all-linear` as a shortcut. Layers are specific to a model architecture and can be specified as noted [here](https://github.com/foundation-model-stack/fms-hf-tuning?tab=readme-ov-file#lora-tuning-example)
138+
139+
- Legend:
140+
141+
✅ Ready and available
142+
143+
✔️ Ready and available - compatible architecture (*see first bullet point above)
144+
145+
🚫 Not supported
146+
147+
? May be supported, but not tested
148+
149+
Model Name & Size | Model Architecture | Full Finetuning | Low Rank Adaptation (i.e. LoRA) | qLoRA(quantized LoRA) |
150+
-------------------- | ---------------- | --------------- | ------------------------------- | --------------------- |
151+
Granite PowerLM 3B | GraniteForCausalLM | ✅* | ✅* | ✅* |
152+
Granite 3.0 2B | GraniteForCausalLM | ✔️* | ✔️* | ✔️* |
153+
Granite 3.0 8B | GraniteForCausalLM | ✅* | ✅* | ✔️ |
154+
GraniteMoE 1B | GraniteMoeForCausalLM | ✅ | ✅** | ? |
155+
GraniteMoE 3B | GraniteMoeForCausalLM | ✅ | ✅** | ? |
156+
Granite 3B | LlamawithCausalLM | ✅ | ✔️ | ✔️ |
157+
Granite 8B | LlamawithCausalLM | ✅ | ✅ | ✅ |
158+
Granite 13B | GPTBigCodeForCausalLM | ✅ | ✅ | ✔️ |
159+
Granite 20B | GPTBigCodeForCausalLM | ✅ | ✔️ | ✔️ |
160+
Granite 34B | GPTBigCodeForCausalLM | 🚫 | ✅ | ✅ |
161+
Llama3.1-8B | LLaMA 3.1 | ✅*** | ✔️ | ✔️ |  
162+
Llama3.1-70B(same architecture as llama3) | LLaMA 3.1 | 🚫 - same as Llama3-70B | ✔️ | ✔️ |
163+
Llama3.1-405B | LLaMA 3.1 | 🚫 | 🚫 | ✅ |
164+
Llama3-8B | LLaMA 3 | ✅ | ✅ | ✔️ |  
165+
Llama3-70B | LLaMA 3 | 🚫 | ✅ | ✅ |
166+
aLLaM-13b | LlamaForCausalLM |  ✅ | ✅ | ✅ |
167+
Mixtral 8x7B | Mixtral | ✅ | ✅ | ✅ |
168+
Mistral-7b | Mistral | ✅ | ✅ | ✅ |  
169+
Mistral large | Mistral | 🚫 | 🚫 | 🚫 |
170+
171+
(*) - Supported with `fms-hf-tuning` v2.0.1 or later
172+
173+
(**) - Supported for q,k,v,o layers . `all-linear` target modules does not infer on vLLM yet.
174+
175+
(***) - Supported from platform up to 8k context length - same architecture as llama3-8b
136176

137177
## Training
138178

pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@ classifiers=[
2828
dependencies = [
2929
"numpy>=1.26.4,<2.0",
3030
"accelerate>=0.20.3,!=0.34,<1.1",
31-
"transformers>4.41,<4.46",
31+
"transformers>=4.45,<4.46",
3232
"torch>=2.2.0,<2.5",
3333
"sentencepiece>=0.1.99,<0.3",
3434
"tokenizers>=0.13.3,<1.0",

0 commit comments

Comments
 (0)