Update README.md

dushyantbehl · praveenj83 · dushyantbehl · commit ae511a0e48ca · 2025-09-04T16:15:41.000+05:30
Co-authored-by: Praveen Jayachandran &lt;praveenj83@users.noreply.github.com&gt;
Signed-off-by: Dushyant Behl &lt;dushyantbehl@users.noreply.github.com&gt;
Signed-off-by: Dushyant Behl &lt;dushyantbehl@in.ibm.com&gt;
diff --git a/README.md b/README.md
@@ -21,7 +21,7 @@ This repo provides basic tuning scripts with support for specific models. The re
 
 ## Installation
 
-Refer our [Installation](./docs/installations.md) guide for details on how to install the library.
+Refer our [Installation](./docs/installation.md) guide for details on how to install the library.
 
 ## Tuning Techniques:
 
@@ -31,13 +31,12 @@ Please refer to our [tuning techniques document](./docs/tuning-techniques.md) fo
 * [GPTQ-LoRA](./docs/tuning-techniques.md#gptq-lora-with-autogptq-tuning-example) 
 * [Full Fine Tuning](./docs/tuning-techniques.md#fine-tuning)
 * [Use FMS Acceleration](./docs/tuning-techniques.md#fms-acceleration)
-* [Extended Pre-Training](./docs/tuning-techniques.md#extended-pre-training) 
+* [Extended Pre-Training](./docs/tuning-techniques.md#extended-pre-training)
 
 ## Training and Training Parameters:
 
-Please refer our [document](./docs/training.md) to see how to start [Single GPU](./docs/training.md#single-gpu) or [Multi-GPU](./docs/training.md#multiple-gpus-with-fsdp) runs with fms-hf-tuning.
-
-You can also refer the same [document](./docs/training.md#tips-on-parameters-to-set) on how to use various training arguments.
+* Please refer our document on [training](./docs/training.md) to see how to start [Single GPU](./docs/training.md#single-gpu) or [Multi-GPU](./docs/training.md#multiple-gpus-with-fsdp) runs with fms-hf-tuning.
+* You can also refer the same a different [section](./docs/training.md#tips-on-parameters-to-set) of the same document on tips to set various training arguments.
 
 ### *Debug recommendation:*
 While training, if you encounter flash-attn errors such as `undefined symbol`, you can follow the below steps for clean installation of flash binaries. This may occur when having multiple environments sharing the pip cache directory or torch version is updated.
@@ -50,7 +49,7 @@ pip install fms-hf-tuning[flash-attn]
 
 ## Supported Models
 
-- For each tuning technique, we run testing on a single large model of each architecture type and claim support for the smaller models. For example, with QLoRA technique, we tested on granite-34b GPTBigCode and claim support for granite-20b-multilingual.
+- While we expect most Hugging Face decoder models to work, we have primarily tested fine-tuning for Granite, Llama, Mistral and GPT-OSS family of models.
 
 - LoRA Layers supported : All the linear layers of a model + output `lm_head` layer. Users can specify layers as a list or use `all-linear` as a shortcut. Layers are specific to a model architecture and can be specified as noted [here](https://github.com/foundation-model-stack/fms-hf-tuning?tab=readme-ov-file#lora-tuning-example)
 
@@ -67,20 +66,20 @@ pip install fms-hf-tuning[flash-attn]
 Model Name & Size  | Model Architecture | Full Finetuning | Low Rank Adaptation (i.e. LoRA) | qLoRA(quantized LoRA) | 
 -------------------- | ---------------- | --------------- | ------------------------------- | --------------------- |
 [Granite 4.0 Tiny Preview](https://huggingface.co/ibm-granite/granite-4.0-tiny-preview) | GraniteMoeHybridForCausalLM | ✅ | ✅ | ? |
-[Granite PowerLM 3B](https://huggingface.co/ibm-research/PowerLM-3b) | GraniteForCausalLM | ✅* | ✅* | ✅* |
-[Granite 3.1 1B](https://huggingface.co/ibm-granite/granite-3.1-1b-a400m-base)       | GraniteForCausalLM | ✔️* | ✔️* | ✔️* |
-[Granite 3.1 2B](https://huggingface.co/ibm-granite/granite-3.1-2b-base)             | GraniteForCausalLM | ✔️* | ✔️* | ✔️* |
-[Granite 3.1 8B](https://huggingface.co/ibm-granite/granite-3.1-8b-base)       | GraniteForCausalLM | ✔️* | ✔️* | ✔️* |
-[Granite 3.0 2B](https://huggingface.co/ibm-granite/granite-3.0-2b-base)       | GraniteForCausalLM | ✔️* | ✔️* | ✔️* |
-[Granite 3.0 8B](https://huggingface.co/ibm-granite/granite-3.0-8b-base)       | GraniteForCausalLM | ✅* | ✅* | ✔️ |
-[GraniteMoE 1B](https://huggingface.co/ibm-granite/granite-3.0-1b-a400m-base)        | GraniteMoeForCausalLM  | ✅ | ✅** | ? |
-[GraniteMoE 3B](https://huggingface.co/ibm-granite/granite-3.0-3b-a800m-base)        | GraniteMoeForCausalLM  | ✅ | ✅** | ? |
+[Granite PowerLM 3B](https://huggingface.co/ibm-research/PowerLM-3b) | GraniteForCausalLM | ✅ | ✅ | ✅ |
+[Granite 3.1 1B](https://huggingface.co/ibm-granite/granite-3.1-1b-a400m-base)       | GraniteForCausalLM | ✔️ | ✔️ | ✔️ |
+[Granite 3.1 2B](https://huggingface.co/ibm-granite/granite-3.1-2b-base)             | GraniteForCausalLM | ✔️ | ✔️ | ✔️ |
+[Granite 3.1 8B](https://huggingface.co/ibm-granite/granite-3.1-8b-base)       | GraniteForCausalLM | ✔️ | ✔️ | ✔️ |
+[Granite 3.0 2B](https://huggingface.co/ibm-granite/granite-3.0-2b-base)       | GraniteForCausalLM | ✔️ | ✔️ | ✔️ |
+[Granite 3.0 8B](https://huggingface.co/ibm-granite/granite-3.0-8b-base)       | GraniteForCausalLM | ✅ | ✅ | ✔️ |
+[GraniteMoE 1B](https://huggingface.co/ibm-granite/granite-3.0-1b-a400m-base)        | GraniteMoeForCausalLM  | ✅ | ✅* | ? |
+[GraniteMoE 3B](https://huggingface.co/ibm-granite/granite-3.0-3b-a800m-base)        | GraniteMoeForCausalLM  | ✅ | ✅* | ? |
 [Granite 3B Code](https://huggingface.co/ibm-granite/granite-3b-code-base-2k)           | LlamaForCausalLM      | ✅ | ✔️  | ✔️ | 
 [Granite 8B Code](https://huggingface.co/ibm-granite/granite-8b-code-base-4k)           | LlamaForCausalLM      | ✅ | ✅ | ✅ |
 Granite 13B          | GPTBigCodeForCausalLM  | ✅ | ✅ | ✔️  | 
 Granite 20B          | GPTBigCodeForCausalLM  | ✅ | ✔️  | ✔️  | 
 [Granite 34B Code](https://huggingface.co/ibm-granite/granite-34b-code-instruct-8k)            | GPTBigCodeForCausalLM  | 🚫 | ✅ | ✅ | 
-[Llama3.1-8B](https://huggingface.co/meta-llama/Llama-3.1-8B)          | LlamaForCausalLM               | ✅*** | ✔️ | ✔️ |  
+[Llama3.1-8B](https://huggingface.co/meta-llama/Llama-3.1-8B)          | LlamaForCausalLM               | ✅** | ✔️ | ✔️ |  
 [Llama3.1-70B](https://huggingface.co/meta-llama/Llama-3.1-70B)(same architecture as llama3) | LlamaForCausalLM   | 🚫 - same as Llama3-70B | ✔️  | ✔️ | 
 [Llama3.1-405B](https://huggingface.co/meta-llama/Llama-3.1-405B)                            | LlamaForCausalLM   | 🚫 | 🚫 | ✅ | 
 [Llama3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B)                               | LlamaForCausalLM   | ✅ | ✅ | ✔️ |  
@@ -92,11 +91,9 @@ Mistral large                             | MistralForCausalLM   | 🚫 | 🚫 |
 [GPT-OSS-20B](https://huggingface.co/openai/gpt-oss-20b)                                  | GptOssForCausalLM   | ✅ | ✅ | ? |  
 [GPT-OSS-120B](https://huggingface.co/openai/gpt-oss-120b)                                  | GptOssForCausalLM   | ✅ | ✅ | ? |  
 
-(*) - Supported with `fms-hf-tuning` v2.4.0 or later.
-
-(**) - Supported for q,k,v,o layers . `all-linear` target modules does not infer on vLLM yet.
+(*) - Supported for q,k,v,o layers . `all-linear` target modules does not infer on vLLM yet.
 
-(***) - Supported from platform up to 8k context length - same architecture as llama3-8b.
+(**) - Supported from platform up to 8k context length - same architecture as llama3-8b.
 
 ### Supported vision model
 
diff --git a/docs/experiment-tracking.md b/docs/experiment-tracking.md
@@ -18,9 +18,6 @@ sft_trainer.train(train_args=training_args,...)
 
 For each of the requested trackers the code expects you to pass a config to the `sft_trainer.train` function which can be specified through `tracker_conifgs` argument [here](https://github.com/foundation-model-stack/fms-hf-tuning/blob/a9b8ec8d1d50211873e63fa4641054f704be8712/tuning/sft_trainer.py#L78) details of which are present below.  
 
-
-
-
 ## Tracker Configurations
 
 ## File Logging Tracker
diff --git a/docs/installation.md b/docs/installation.md
@@ -45,11 +45,18 @@ Experiment tracking in fms-hf-tuning allows users to track their experiments wit
 
 The code supports currently these trackers out of the box, 
 * `FileLoggingTracker` : A built in tracker which supports logging training loss to a file.
+    - Since this is builin no need to install anything. 
 * `Aimstack` : A popular opensource tracker which can be used to track any metrics or metadata from the experiments.
+    - Install by running
+        `pip install fms-hf-tuning[aim]`
 * `MLflow Tracking` : Another popular opensource tracker which stores metrics, metadata or even artifacts from experiments.
+    - Install by running
+        `pip install fms-hf-tuning[mlflow]`
 * `Clearml Tracking` : Another opensource tracker which stores metrics, metadata or even artifacts from experiments.
+    - Install by running
+        `pip install fms-hf-tuning[clearml]`
 
-Further details on enabling and using the trackers mentioned above can be found [here](./experiment-tracking.md).  
+Note. All trackers expect some arguments or can be customized by passing command line arguments which are described in our document on [experiment tracking](./experiment-tracking.md). For further details on enabling and using the trackers use the experiment tracking document.  
 
 ## Training Mamba Models
 
diff --git a/docs/training.md b/docs/training.md
@@ -13,6 +13,8 @@
     - [Resuming tuning from checkpoints](#resuming-tuning-from-checkpoints)
     - [Setting Gradient Checkpointing](#setting-gradient-checkpointing)
   - [Training MXFP4 quantized with fms-hf-tuning](#training-mxfp4-quantized-models)
+
+
 ## Single GPU
 
 Below example runs fine tuning with the given datasets and model: