Skip to content

Commit ae511a0

Browse files
Update README.md
Co-authored-by: Praveen Jayachandran <[email protected]> Signed-off-by: Dushyant Behl <[email protected]> Signed-off-by: Dushyant Behl <[email protected]>
1 parent 77a95b6 commit ae511a0

File tree

4 files changed

+26
-23
lines changed

4 files changed

+26
-23
lines changed

README.md

Lines changed: 16 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ This repo provides basic tuning scripts with support for specific models. The re
2121

2222
## Installation
2323

24-
Refer our [Installation](./docs/installations.md) guide for details on how to install the library.
24+
Refer our [Installation](./docs/installation.md) guide for details on how to install the library.
2525

2626
## Tuning Techniques:
2727

@@ -31,13 +31,12 @@ Please refer to our [tuning techniques document](./docs/tuning-techniques.md) fo
3131
* [GPTQ-LoRA](./docs/tuning-techniques.md#gptq-lora-with-autogptq-tuning-example)
3232
* [Full Fine Tuning](./docs/tuning-techniques.md#fine-tuning)
3333
* [Use FMS Acceleration](./docs/tuning-techniques.md#fms-acceleration)
34-
* [Extended Pre-Training](./docs/tuning-techniques.md#extended-pre-training)
34+
* [Extended Pre-Training](./docs/tuning-techniques.md#extended-pre-training)
3535

3636
## Training and Training Parameters:
3737

38-
Please refer our [document](./docs/training.md) to see how to start [Single GPU](./docs/training.md#single-gpu) or [Multi-GPU](./docs/training.md#multiple-gpus-with-fsdp) runs with fms-hf-tuning.
39-
40-
You can also refer the same [document](./docs/training.md#tips-on-parameters-to-set) on how to use various training arguments.
38+
* Please refer our document on [training](./docs/training.md) to see how to start [Single GPU](./docs/training.md#single-gpu) or [Multi-GPU](./docs/training.md#multiple-gpus-with-fsdp) runs with fms-hf-tuning.
39+
* You can also refer the same a different [section](./docs/training.md#tips-on-parameters-to-set) of the same document on tips to set various training arguments.
4140

4241
### *Debug recommendation:*
4342
While training, if you encounter flash-attn errors such as `undefined symbol`, you can follow the below steps for clean installation of flash binaries. This may occur when having multiple environments sharing the pip cache directory or torch version is updated.
@@ -50,7 +49,7 @@ pip install fms-hf-tuning[flash-attn]
5049

5150
## Supported Models
5251

53-
- For each tuning technique, we run testing on a single large model of each architecture type and claim support for the smaller models. For example, with QLoRA technique, we tested on granite-34b GPTBigCode and claim support for granite-20b-multilingual.
52+
- While we expect most Hugging Face decoder models to work, we have primarily tested fine-tuning for Granite, Llama, Mistral and GPT-OSS family of models.
5453

5554
- LoRA Layers supported : All the linear layers of a model + output `lm_head` layer. Users can specify layers as a list or use `all-linear` as a shortcut. Layers are specific to a model architecture and can be specified as noted [here](https://github.com/foundation-model-stack/fms-hf-tuning?tab=readme-ov-file#lora-tuning-example)
5655

@@ -67,20 +66,20 @@ pip install fms-hf-tuning[flash-attn]
6766
Model Name & Size | Model Architecture | Full Finetuning | Low Rank Adaptation (i.e. LoRA) | qLoRA(quantized LoRA) |
6867
-------------------- | ---------------- | --------------- | ------------------------------- | --------------------- |
6968
[Granite 4.0 Tiny Preview](https://huggingface.co/ibm-granite/granite-4.0-tiny-preview) | GraniteMoeHybridForCausalLM | ✅ | ✅ | ? |
70-
[Granite PowerLM 3B](https://huggingface.co/ibm-research/PowerLM-3b) | GraniteForCausalLM | ✅* | ✅* | ✅* |
71-
[Granite 3.1 1B](https://huggingface.co/ibm-granite/granite-3.1-1b-a400m-base) | GraniteForCausalLM | ✔️* | ✔️* | ✔️* |
72-
[Granite 3.1 2B](https://huggingface.co/ibm-granite/granite-3.1-2b-base) | GraniteForCausalLM | ✔️* | ✔️* | ✔️* |
73-
[Granite 3.1 8B](https://huggingface.co/ibm-granite/granite-3.1-8b-base) | GraniteForCausalLM | ✔️* | ✔️* | ✔️* |
74-
[Granite 3.0 2B](https://huggingface.co/ibm-granite/granite-3.0-2b-base) | GraniteForCausalLM | ✔️* | ✔️* | ✔️* |
75-
[Granite 3.0 8B](https://huggingface.co/ibm-granite/granite-3.0-8b-base) | GraniteForCausalLM | ✅* | ✅* | ✔️ |
76-
[GraniteMoE 1B](https://huggingface.co/ibm-granite/granite-3.0-1b-a400m-base) | GraniteMoeForCausalLM | ✅ | ✅** | ? |
77-
[GraniteMoE 3B](https://huggingface.co/ibm-granite/granite-3.0-3b-a800m-base) | GraniteMoeForCausalLM | ✅ | ✅** | ? |
69+
[Granite PowerLM 3B](https://huggingface.co/ibm-research/PowerLM-3b) | GraniteForCausalLM | ✅ | ✅ | ✅ |
70+
[Granite 3.1 1B](https://huggingface.co/ibm-granite/granite-3.1-1b-a400m-base) | GraniteForCausalLM | ✔️ | ✔️ | ✔️ |
71+
[Granite 3.1 2B](https://huggingface.co/ibm-granite/granite-3.1-2b-base) | GraniteForCausalLM | ✔️ | ✔️ | ✔️ |
72+
[Granite 3.1 8B](https://huggingface.co/ibm-granite/granite-3.1-8b-base) | GraniteForCausalLM | ✔️ | ✔️ | ✔️ |
73+
[Granite 3.0 2B](https://huggingface.co/ibm-granite/granite-3.0-2b-base) | GraniteForCausalLM | ✔️ | ✔️ | ✔️ |
74+
[Granite 3.0 8B](https://huggingface.co/ibm-granite/granite-3.0-8b-base) | GraniteForCausalLM | ✅ | ✅ | ✔️ |
75+
[GraniteMoE 1B](https://huggingface.co/ibm-granite/granite-3.0-1b-a400m-base) | GraniteMoeForCausalLM | ✅ | ✅* | ? |
76+
[GraniteMoE 3B](https://huggingface.co/ibm-granite/granite-3.0-3b-a800m-base) | GraniteMoeForCausalLM | ✅ | ✅* | ? |
7877
[Granite 3B Code](https://huggingface.co/ibm-granite/granite-3b-code-base-2k) | LlamaForCausalLM | ✅ | ✔️ | ✔️ |
7978
[Granite 8B Code](https://huggingface.co/ibm-granite/granite-8b-code-base-4k) | LlamaForCausalLM | ✅ | ✅ | ✅ |
8079
Granite 13B | GPTBigCodeForCausalLM | ✅ | ✅ | ✔️ |
8180
Granite 20B | GPTBigCodeForCausalLM | ✅ | ✔️ | ✔️ |
8281
[Granite 34B Code](https://huggingface.co/ibm-granite/granite-34b-code-instruct-8k) | GPTBigCodeForCausalLM | 🚫 | ✅ | ✅ |
83-
[Llama3.1-8B](https://huggingface.co/meta-llama/Llama-3.1-8B) | LlamaForCausalLM | ✅*** | ✔️ | ✔️ |  
82+
[Llama3.1-8B](https://huggingface.co/meta-llama/Llama-3.1-8B) | LlamaForCausalLM | ✅** | ✔️ | ✔️ |  
8483
[Llama3.1-70B](https://huggingface.co/meta-llama/Llama-3.1-70B)(same architecture as llama3) | LlamaForCausalLM | 🚫 - same as Llama3-70B | ✔️ | ✔️ |
8584
[Llama3.1-405B](https://huggingface.co/meta-llama/Llama-3.1-405B) | LlamaForCausalLM | 🚫 | 🚫 | ✅ |
8685
[Llama3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B) | LlamaForCausalLM | ✅ | ✅ | ✔️ |  
@@ -92,11 +91,9 @@ Mistral large | MistralForCausalLM | 🚫 | 🚫 |
9291
[GPT-OSS-20B](https://huggingface.co/openai/gpt-oss-20b) | GptOssForCausalLM | ✅ | ✅ | ? |  
9392
[GPT-OSS-120B](https://huggingface.co/openai/gpt-oss-120b) | GptOssForCausalLM | ✅ | ✅ | ? |  
9493

95-
(*) - Supported with `fms-hf-tuning` v2.4.0 or later.
96-
97-
(**) - Supported for q,k,v,o layers . `all-linear` target modules does not infer on vLLM yet.
94+
(*) - Supported for q,k,v,o layers . `all-linear` target modules does not infer on vLLM yet.
9895

99-
(***) - Supported from platform up to 8k context length - same architecture as llama3-8b.
96+
(**) - Supported from platform up to 8k context length - same architecture as llama3-8b.
10097

10198
### Supported vision model
10299

docs/experiment-tracking.md

Lines changed: 0 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -18,9 +18,6 @@ sft_trainer.train(train_args=training_args,...)
1818

1919
For each of the requested trackers the code expects you to pass a config to the `sft_trainer.train` function which can be specified through `tracker_conifgs` argument [here](https://github.com/foundation-model-stack/fms-hf-tuning/blob/a9b8ec8d1d50211873e63fa4641054f704be8712/tuning/sft_trainer.py#L78) details of which are present below.
2020

21-
22-
23-
2421
## Tracker Configurations
2522

2623
## File Logging Tracker
Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -45,11 +45,18 @@ Experiment tracking in fms-hf-tuning allows users to track their experiments wit
4545

4646
The code supports currently these trackers out of the box,
4747
* `FileLoggingTracker` : A built in tracker which supports logging training loss to a file.
48+
- Since this is builin no need to install anything.
4849
* `Aimstack` : A popular opensource tracker which can be used to track any metrics or metadata from the experiments.
50+
- Install by running
51+
`pip install fms-hf-tuning[aim]`
4952
* `MLflow Tracking` : Another popular opensource tracker which stores metrics, metadata or even artifacts from experiments.
53+
- Install by running
54+
`pip install fms-hf-tuning[mlflow]`
5055
* `Clearml Tracking` : Another opensource tracker which stores metrics, metadata or even artifacts from experiments.
56+
- Install by running
57+
`pip install fms-hf-tuning[clearml]`
5158

52-
Further details on enabling and using the trackers mentioned above can be found [here](./experiment-tracking.md).
59+
Note. All trackers expect some arguments or can be customized by passing command line arguments which are described in our document on [experiment tracking](./experiment-tracking.md). For further details on enabling and using the trackers use the experiment tracking document.
5360

5461
## Training Mamba Models
5562

docs/training.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,8 @@
1313
- [Resuming tuning from checkpoints](#resuming-tuning-from-checkpoints)
1414
- [Setting Gradient Checkpointing](#setting-gradient-checkpointing)
1515
- [Training MXFP4 quantized with fms-hf-tuning](#training-mxfp4-quantized-models)
16+
17+
1618
## Single GPU
1719

1820
Below example runs fine tuning with the given datasets and model:

0 commit comments

Comments
 (0)