Skip to content

Commit 793d04e

Browse files
Update README.md
Co-authored-by: Praveen Jayachandran <[email protected]> Signed-off-by: Dushyant Behl <[email protected]> Signed-off-by: Dushyant Behl <[email protected]>
1 parent 77a95b6 commit 793d04e

File tree

5 files changed

+81
-71
lines changed

5 files changed

+81
-71
lines changed

README.md

Lines changed: 10 additions & 67 deletions
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ This repo provides basic tuning scripts with support for specific models. The re
2121

2222
## Installation
2323

24-
Refer our [Installation](./docs/installations.md) guide for details on how to install the library.
24+
Refer our [Installation](./docs/installation.md) guide for details on how to install the library.
2525

2626
## Tuning Techniques:
2727

@@ -31,13 +31,12 @@ Please refer to our [tuning techniques document](./docs/tuning-techniques.md) fo
3131
* [GPTQ-LoRA](./docs/tuning-techniques.md#gptq-lora-with-autogptq-tuning-example)
3232
* [Full Fine Tuning](./docs/tuning-techniques.md#fine-tuning)
3333
* [Use FMS Acceleration](./docs/tuning-techniques.md#fms-acceleration)
34-
* [Extended Pre-Training](./docs/tuning-techniques.md#extended-pre-training)
34+
* [Extended Pre-Training](./docs/tuning-techniques.md#extended-pre-training)
3535

3636
## Training and Training Parameters:
3737

38-
Please refer our [document](./docs/training.md) to see how to start [Single GPU](./docs/training.md#single-gpu) or [Multi-GPU](./docs/training.md#multiple-gpus-with-fsdp) runs with fms-hf-tuning.
39-
40-
You can also refer the same [document](./docs/training.md#tips-on-parameters-to-set) on how to use various training arguments.
38+
* Please refer our document on [training](./docs/training.md) to see how to start [Single GPU](./docs/training.md#single-gpu) or [Multi-GPU](./docs/training.md#multiple-gpus-with-fsdp) runs with fms-hf-tuning.
39+
* You can also refer the same a different [section](./docs/training.md#tips-on-parameters-to-set) of the same document on tips to set various training arguments.
4140

4241
### *Debug recommendation:*
4342
While training, if you encounter flash-attn errors such as `undefined symbol`, you can follow the below steps for clean installation of flash binaries. This may occur when having multiple environments sharing the pip cache directory or torch version is updated.
@@ -50,71 +49,15 @@ pip install fms-hf-tuning[flash-attn]
5049

5150
## Supported Models
5251

53-
- For each tuning technique, we run testing on a single large model of each architecture type and claim support for the smaller models. For example, with QLoRA technique, we tested on granite-34b GPTBigCode and claim support for granite-20b-multilingual.
52+
- While we expect most Hugging Face decoder models to work, we have primarily tested fine-tuning for below family of models.
53+
* [IBM Granite](https://huggingface.co/ibm-granite)
54+
* [Meta Llama](https://huggingface.co/meta-llama)
55+
* [Mistral Ai](https://huggingface.co/mistralai) and
56+
* [OpenAI GPT-OSS](https://huggingface.co/collections/openai/gpt-oss-68911959590a1634ba11c7a4)
5457

5558
- LoRA Layers supported : All the linear layers of a model + output `lm_head` layer. Users can specify layers as a list or use `all-linear` as a shortcut. Layers are specific to a model architecture and can be specified as noted [here](https://github.com/foundation-model-stack/fms-hf-tuning?tab=readme-ov-file#lora-tuning-example)
5659

57-
- Legend:
58-
59-
✅ Ready and available
60-
61-
✔️ Ready and available - compatible architecture (*see first bullet point above)
62-
63-
🚫 Not supported
64-
65-
? May be supported, but not tested
66-
67-
Model Name & Size | Model Architecture | Full Finetuning | Low Rank Adaptation (i.e. LoRA) | qLoRA(quantized LoRA) |
68-
-------------------- | ---------------- | --------------- | ------------------------------- | --------------------- |
69-
[Granite 4.0 Tiny Preview](https://huggingface.co/ibm-granite/granite-4.0-tiny-preview) | GraniteMoeHybridForCausalLM | ✅ | ✅ | ? |
70-
[Granite PowerLM 3B](https://huggingface.co/ibm-research/PowerLM-3b) | GraniteForCausalLM | ✅* | ✅* | ✅* |
71-
[Granite 3.1 1B](https://huggingface.co/ibm-granite/granite-3.1-1b-a400m-base) | GraniteForCausalLM | ✔️* | ✔️* | ✔️* |
72-
[Granite 3.1 2B](https://huggingface.co/ibm-granite/granite-3.1-2b-base) | GraniteForCausalLM | ✔️* | ✔️* | ✔️* |
73-
[Granite 3.1 8B](https://huggingface.co/ibm-granite/granite-3.1-8b-base) | GraniteForCausalLM | ✔️* | ✔️* | ✔️* |
74-
[Granite 3.0 2B](https://huggingface.co/ibm-granite/granite-3.0-2b-base) | GraniteForCausalLM | ✔️* | ✔️* | ✔️* |
75-
[Granite 3.0 8B](https://huggingface.co/ibm-granite/granite-3.0-8b-base) | GraniteForCausalLM | ✅* | ✅* | ✔️ |
76-
[GraniteMoE 1B](https://huggingface.co/ibm-granite/granite-3.0-1b-a400m-base) | GraniteMoeForCausalLM | ✅ | ✅** | ? |
77-
[GraniteMoE 3B](https://huggingface.co/ibm-granite/granite-3.0-3b-a800m-base) | GraniteMoeForCausalLM | ✅ | ✅** | ? |
78-
[Granite 3B Code](https://huggingface.co/ibm-granite/granite-3b-code-base-2k) | LlamaForCausalLM | ✅ | ✔️ | ✔️ |
79-
[Granite 8B Code](https://huggingface.co/ibm-granite/granite-8b-code-base-4k) | LlamaForCausalLM | ✅ | ✅ | ✅ |
80-
Granite 13B | GPTBigCodeForCausalLM | ✅ | ✅ | ✔️ |
81-
Granite 20B | GPTBigCodeForCausalLM | ✅ | ✔️ | ✔️ |
82-
[Granite 34B Code](https://huggingface.co/ibm-granite/granite-34b-code-instruct-8k) | GPTBigCodeForCausalLM | 🚫 | ✅ | ✅ |
83-
[Llama3.1-8B](https://huggingface.co/meta-llama/Llama-3.1-8B) | LlamaForCausalLM | ✅*** | ✔️ | ✔️ |  
84-
[Llama3.1-70B](https://huggingface.co/meta-llama/Llama-3.1-70B)(same architecture as llama3) | LlamaForCausalLM | 🚫 - same as Llama3-70B | ✔️ | ✔️ |
85-
[Llama3.1-405B](https://huggingface.co/meta-llama/Llama-3.1-405B) | LlamaForCausalLM | 🚫 | 🚫 | ✅ |
86-
[Llama3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B) | LlamaForCausalLM | ✅ | ✅ | ✔️ |  
87-
[Llama3-70B](https://huggingface.co/meta-llama/Meta-Llama-3-70B) | LlamaForCausalLM | 🚫 | ✅ | ✅ |
88-
aLLaM-13b | LlamaForCausalLM |  ✅ | ✅ | ✅ |
89-
[Mixtral 8x7B](https://huggingface.co/mistralai/Mixtral-8x7B-v0.1) | MixtralForCausalLM | ✅ | ✅ | ✅ |
90-
[Mistral-7B](https://huggingface.co/mistralai/Mistral-7B-v0.1) | MistralForCausalLM | ✅ | ✅ | ✅ |  
91-
Mistral large | MistralForCausalLM | 🚫 | 🚫 | 🚫 |
92-
[GPT-OSS-20B](https://huggingface.co/openai/gpt-oss-20b) | GptOssForCausalLM | ✅ | ✅ | ? |  
93-
[GPT-OSS-120B](https://huggingface.co/openai/gpt-oss-120b) | GptOssForCausalLM | ✅ | ✅ | ? |  
94-
95-
(*) - Supported with `fms-hf-tuning` v2.4.0 or later.
96-
97-
(**) - Supported for q,k,v,o layers . `all-linear` target modules does not infer on vLLM yet.
98-
99-
(***) - Supported from platform up to 8k context length - same architecture as llama3-8b.
100-
101-
### Supported vision model
102-
103-
We also support full fine-tuning and LoRA tuning for vision language models - `Granite 3.2 Vision`, `Llama 3.2 Vision`, and `LLaVa-Next` from `v2.8.1` onwards.
104-
For information on supported dataset formats and how to tune a vision-language model, please see [this document](./vision-language-model-tuning.md).
105-
106-
Model Name & Size | Model Architecture | LoRA Tuning | Full Finetuning |
107-
-------------------- | ---------------- | --------------- | --------------- |
108-
Llama 3.2-11B Vision | MllamaForConditionalGeneration | ✅ | ✅ |
109-
Llama 3.2-90B Vision | MllamaForConditionalGeneration | ✔️ | ✔️ |
110-
Granite 3.2-2B Vision | LlavaNextForConditionalGeneration | ✅ | ✅ |
111-
Llava Mistral 1.6-7B | LlavaNextForConditionalGeneration | ✅ | ✅ |
112-
Llava 1.6-34B | LlavaNextForConditionalGeneration | ✔️ | ✔️ |
113-
Llava 1.5-7B | LlavaForConditionalGeneration | ✅ | ✅ |
114-
Llava 1.5-13B | LlavaForConditionalGeneration | ✔️ | ✔️ |
115-
116-
**Note**:
117-
* vLLM currently does not support inference with LoRA-tuned vision models. To use a tuned LoRA adapter of vision model, please merge it with the base model before running vLLM inference.
60+
An extended list for tested models is maintaned in the [supported models](./docs/supported-models.md) document but might have outdated information.
11861

11962
## Data Support
12063
Users can pass training data as either a single file or a Hugging Face dataset ID using the `--training_data_path` argument along with other arguments required for various [use cases](./docs/advanced-data-preprocessing.md#use-cases-supported-via-command-line-argument-training_data_path). If user choose to pass a file, it can be in any of the [supported formats](#supported-data-formats). Alternatively, you can use our powerful [data preprocessing backend](./docs/advanced-data-preprocessing.md) to preprocess datasets on the fly.

docs/experiment-tracking.md

Lines changed: 0 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -18,9 +18,6 @@ sft_trainer.train(train_args=training_args,...)
1818

1919
For each of the requested trackers the code expects you to pass a config to the `sft_trainer.train` function which can be specified through `tracker_conifgs` argument [here](https://github.com/foundation-model-stack/fms-hf-tuning/blob/a9b8ec8d1d50211873e63fa4641054f704be8712/tuning/sft_trainer.py#L78) details of which are present below.
2020

21-
22-
23-
2421
## Tracker Configurations
2522

2623
## File Logging Tracker
Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -45,11 +45,18 @@ Experiment tracking in fms-hf-tuning allows users to track their experiments wit
4545

4646
The code supports currently these trackers out of the box,
4747
* `FileLoggingTracker` : A built in tracker which supports logging training loss to a file.
48+
- Since this is builin no need to install anything.
4849
* `Aimstack` : A popular opensource tracker which can be used to track any metrics or metadata from the experiments.
50+
- Install by running
51+
`pip install fms-hf-tuning[aim]`
4952
* `MLflow Tracking` : Another popular opensource tracker which stores metrics, metadata or even artifacts from experiments.
53+
- Install by running
54+
`pip install fms-hf-tuning[mlflow]`
5055
* `Clearml Tracking` : Another opensource tracker which stores metrics, metadata or even artifacts from experiments.
56+
- Install by running
57+
`pip install fms-hf-tuning[clearml]`
5158

52-
Further details on enabling and using the trackers mentioned above can be found [here](./experiment-tracking.md).
59+
Note. All trackers expect some arguments or can be customized by passing command line arguments which are described in our document on [experiment tracking](./experiment-tracking.md). For further details on enabling and using the trackers use the experiment tracking document.
5360

5461
## Training Mamba Models
5562

docs/supported-models.md

Lines changed: 61 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,61 @@
1+
# Supported models list
2+
3+
- Legend:
4+
5+
✅ Ready and available
6+
7+
✔️ Ready and available - compatible architecture (*see first bullet point above)
8+
9+
🚫 Not supported
10+
11+
? May be supported, but not tested
12+
13+
Model Name & Size | Model Architecture | Full Finetuning | Low Rank Adaptation (i.e. LoRA) | qLoRA(quantized LoRA) |
14+
-------------------- | ---------------- | --------------- | ------------------------------- | --------------------- |
15+
[Granite 4.0 Tiny Preview](https://huggingface.co/ibm-granite/granite-4.0-tiny-preview) | GraniteMoeHybridForCausalLM | ✅ | ✅ | ? |
16+
[Granite PowerLM 3B](https://huggingface.co/ibm-research/PowerLM-3b) | GraniteForCausalLM | ✅ | ✅ | ✅ |
17+
[Granite 3.1 1B](https://huggingface.co/ibm-granite/granite-3.1-1b-a400m-base) | GraniteForCausalLM | ✔️ | ✔️ | ✔️ |
18+
[Granite 3.1 2B](https://huggingface.co/ibm-granite/granite-3.1-2b-base) | GraniteForCausalLM | ✔️ | ✔️ | ✔️ |
19+
[Granite 3.1 8B](https://huggingface.co/ibm-granite/granite-3.1-8b-base) | GraniteForCausalLM | ✔️ | ✔️ | ✔️ |
20+
[Granite 3.0 2B](https://huggingface.co/ibm-granite/granite-3.0-2b-base) | GraniteForCausalLM | ✔️ | ✔️ | ✔️ |
21+
[Granite 3.0 8B](https://huggingface.co/ibm-granite/granite-3.0-8b-base) | GraniteForCausalLM | ✅ | ✅ | ✔️ |
22+
[GraniteMoE 1B](https://huggingface.co/ibm-granite/granite-3.0-1b-a400m-base) | GraniteMoeForCausalLM | ✅ | ✅* | ? |
23+
[GraniteMoE 3B](https://huggingface.co/ibm-granite/granite-3.0-3b-a800m-base) | GraniteMoeForCausalLM | ✅ | ✅* | ? |
24+
[Granite 3B Code](https://huggingface.co/ibm-granite/granite-3b-code-base-2k) | LlamaForCausalLM | ✅ | ✔️ | ✔️ |
25+
[Granite 8B Code](https://huggingface.co/ibm-granite/granite-8b-code-base-4k) | LlamaForCausalLM | ✅ | ✅ | ✅ |
26+
Granite 13B | GPTBigCodeForCausalLM | ✅ | ✅ | ✔️ |
27+
Granite 20B | GPTBigCodeForCausalLM | ✅ | ✔️ | ✔️ |
28+
[Granite 34B Code](https://huggingface.co/ibm-granite/granite-34b-code-instruct-8k) | GPTBigCodeForCausalLM | 🚫 | ✅ | ✅ |
29+
[Llama3.1-8B](https://huggingface.co/meta-llama/Llama-3.1-8B) | LlamaForCausalLM | ✅** | ✔️ | ✔️ |  
30+
[Llama3.1-70B](https://huggingface.co/meta-llama/Llama-3.1-70B)(same architecture as llama3) | LlamaForCausalLM | 🚫 - same as Llama3-70B | ✔️ | ✔️ |
31+
[Llama3.1-405B](https://huggingface.co/meta-llama/Llama-3.1-405B) | LlamaForCausalLM | 🚫 | 🚫 | ✅ |
32+
[Llama3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B) | LlamaForCausalLM | ✅ | ✅ | ✔️ |  
33+
[Llama3-70B](https://huggingface.co/meta-llama/Meta-Llama-3-70B) | LlamaForCausalLM | 🚫 | ✅ | ✅ |
34+
aLLaM-13b | LlamaForCausalLM |  ✅ | ✅ | ✅ |
35+
[Mixtral 8x7B](https://huggingface.co/mistralai/Mixtral-8x7B-v0.1) | MixtralForCausalLM | ✅ | ✅ | ✅ |
36+
[Mistral-7B](https://huggingface.co/mistralai/Mistral-7B-v0.1) | MistralForCausalLM | ✅ | ✅ | ✅ |  
37+
Mistral large | MistralForCausalLM | 🚫 | 🚫 | 🚫 |
38+
[GPT-OSS-20B](https://huggingface.co/openai/gpt-oss-20b) | GptOssForCausalLM | ✅ | ✅ | ? |  
39+
[GPT-OSS-120B](https://huggingface.co/openai/gpt-oss-120b) | GptOssForCausalLM | ✅ | ✅ | ? |  
40+
41+
(*) - Supported for q,k,v,o layers . `all-linear` target modules does not infer on vLLM yet.
42+
43+
(**) - Supported from platform up to 8k context length - same architecture as llama3-8b.
44+
45+
### Supported vision model
46+
47+
We also support full fine-tuning and LoRA tuning for vision language models - `Granite 3.2 Vision`, `Llama 3.2 Vision`, and `LLaVa-Next` from `v2.8.1` onwards.
48+
For information on supported dataset formats and how to tune a vision-language model, please see [this document](./vision-language-model-tuning.md).
49+
50+
Model Name & Size | Model Architecture | LoRA Tuning | Full Finetuning |
51+
-------------------- | ---------------- | --------------- | --------------- |
52+
Llama 3.2-11B Vision | MllamaForConditionalGeneration | ✅ | ✅ |
53+
Llama 3.2-90B Vision | MllamaForConditionalGeneration | ✔️ | ✔️ |
54+
Granite 3.2-2B Vision | LlavaNextForConditionalGeneration | ✅ | ✅ |
55+
Llava Mistral 1.6-7B | LlavaNextForConditionalGeneration | ✅ | ✅ |
56+
Llava 1.6-34B | LlavaNextForConditionalGeneration | ✔️ | ✔️ |
57+
Llava 1.5-7B | LlavaForConditionalGeneration | ✅ | ✅ |
58+
Llava 1.5-13B | LlavaForConditionalGeneration | ✔️ | ✔️ |
59+
60+
**Note**:
61+
* vLLM currently does not support inference with LoRA-tuned vision models. To use a tuned LoRA adapter of vision model, please merge it with the base model before running vLLM inference.

docs/training.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,8 @@
1313
- [Resuming tuning from checkpoints](#resuming-tuning-from-checkpoints)
1414
- [Setting Gradient Checkpointing](#setting-gradient-checkpointing)
1515
- [Training MXFP4 quantized with fms-hf-tuning](#training-mxfp4-quantized-models)
16+
17+
1618
## Single GPU
1719

1820
Below example runs fine tuning with the given datasets and model:

0 commit comments

Comments
 (0)