Skip to content

Commit 76bd76d

Browse files
authored
Merge pull request #442 from foundation-model-stack/v2.4.0-rc2
chore(release): merge set of changes for v2.4.0
2 parents 3ec30a0 + 75a5ff6 commit 76bd76d

17 files changed

+459
-72
lines changed

.github/workflows/image.yaml

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -15,9 +15,8 @@ jobs:
1515
sudo swapoff -a
1616
sudo rm -f /swapfile
1717
sudo apt clean
18-
docker rmi $(docker image ls -aq)
18+
if [ "$(docker image ls -q)" ]; then docker rmi $(docker image ls -aq); fi
1919
df -h
2020
- name: Build image
2121
run: |
2222
docker build -t fms-hf-tuning:dev . -f build/Dockerfile
23-

README.md

Lines changed: 23 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
# FMS HF Tuning
22

33
- [Installation](#installation)
4-
- [Data format](#data-format)
4+
- [Data format support](#data-support)
55
- [Supported Models](#supported-models)
66
- [Training](#training)
77
- [Single GPU](#single-gpu)
@@ -62,13 +62,13 @@ pip install fms-hf-tuning[aim]
6262
For more details on how to enable and use the trackers, Please see, [the experiment tracking section below](#experiment-tracking).
6363

6464
## Data Support
65-
Users can pass training data in a single file using the `--training_data_path` argument along with other arguments required for various [use cases](#use-cases-supported-with-training_data_path-argument) (see details below) and the file can be in any of the [supported formats](#supported-data-formats). Alternatively, you can use our powerful [data preprocessing backend](./docs/advanced-data-preprocessing.md) to preprocess datasets on the fly.
65+
Users can pass training data as either a single file or a Hugging Face dataset ID using the `--training_data_path` argument along with other arguments required for various [use cases](#use-cases-supported-with-training_data_path-argument) (see details below). If user choose to pass a file, it can be in any of the [supported formats](#supported-data-formats). Alternatively, you can use our powerful [data preprocessing backend](./docs/advanced-data-preprocessing.md) to preprocess datasets on the fly.
6666

6767

6868
Below, we mention the list of supported data usecases via `--training_data_path` argument. For details of our advanced data preprocessing see more details in [Advanced Data Preprocessing](./docs/advanced-data-preprocessing.md).
6969

7070
## Supported Data Formats
71-
We support the following data formats via `--training_data_path` argument
71+
We support the following file formats via `--training_data_path` argument
7272

7373
Data Format | Tested Support
7474
------------|---------------
@@ -77,6 +77,8 @@ JSONL | ✅
7777
PARQUET | ✅
7878
ARROW | ✅
7979

80+
As iterated above, we also support passing a HF dataset ID directly via `--training_data_path` argument.
81+
8082
## Use cases supported with `training_data_path` argument
8183

8284
### 1. Data formats with a single sequence and a specified response_template to use for masking on completion.
@@ -198,6 +200,10 @@ For advanced data preprocessing support including mixing and custom preprocessin
198200
Model Name & Size | Model Architecture | Full Finetuning | Low Rank Adaptation (i.e. LoRA) | qLoRA(quantized LoRA) |
199201
-------------------- | ---------------- | --------------- | ------------------------------- | --------------------- |
200202
Granite PowerLM 3B | GraniteForCausalLM | ✅* | ✅* | ✅* |
203+
Granite 3.1 1B | GraniteForCausalLM | ✔️* | ✔️* | ✔️* |
204+
Granite 3.1 2B | GraniteForCausalLM | ✔️* | ✔️* | ✔️* |
205+
Granite 3.1 3B | GraniteForCausalLM | ✔️* | ✔️* | ✔️* |
206+
Granite 3.1 8B | GraniteForCausalLM | ✔️* | ✔️* | ✔️* |
201207
Granite 3.0 2B | GraniteForCausalLM | ✔️* | ✔️* | ✔️* |
202208
Granite 3.0 8B | GraniteForCausalLM | ✅* | ✅* | ✔️ |
203209
GraniteMoE 1B | GraniteMoeForCausalLM | ✅ | ✅** | ? |
@@ -217,7 +223,7 @@ Mixtral 8x7B | Mixtral | ✅ | ✅ | ✅ |
217223
Mistral-7b | Mistral | ✅ | ✅ | ✅ |  
218224
Mistral large | Mistral | 🚫 | 🚫 | 🚫 |
219225

220-
(*) - Supported with `fms-hf-tuning` v2.0.1 or later
226+
(*) - Supported with `fms-hf-tuning` v2.4.0 or later.
221227

222228
(**) - Supported for q,k,v,o layers . `all-linear` target modules does not infer on vLLM yet.
223229

@@ -742,6 +748,8 @@ The list of configurations for various `fms_acceleration` plugins:
742748
- [attention_and_distributed_packing](./tuning/config/acceleration_configs/attention_and_distributed_packing.py):
743749
- `--padding_free`: technique to process multiple examples in single batch without adding padding tokens that waste compute.
744750
- `--multipack`: technique for *multi-gpu training* to balance out number of tokens processed in each device, to minimize waiting time.
751+
- [fast_moe_config](./tuning/config/acceleration_configs/fast_moe.py) (experimental):
752+
- `--fast_moe`: trains MoE models in parallel, increasing throughput and decreasing memory usage.
745753

746754
Notes:
747755
* `quantized_lora_config` requires that it be used along with LoRA tuning technique. See [LoRA tuning section](https://github.com/foundation-model-stack/fms-hf-tuning/tree/main?tab=readme-ov-file#lora-tuning-example) on the LoRA parameters to pass.
@@ -760,6 +768,17 @@ Notes:
760768
* Notes on Multipack
761769
- works only for *multi-gpu*.
762770
- currently only includes the version of *multipack* optimized for linear attention implementations like *flash-attn*.
771+
* Notes on Fast MoE
772+
- `--fast_moe` is an integer value that configures the amount of expert parallel sharding (ep_degree).
773+
- `world_size` must be divisible by the `ep_degree`
774+
- Running fast moe modifies the state dict of the model, and must be post-processed using [checkpoint utils](https://github.com/foundation-model-stack/fms-acceleration/blob/main/plugins/accelerated-moe/src/fms_acceleration_moe/utils/checkpoint_utils.py) to run inference (HF, vLLM, etc.).
775+
- The typical usecase for this script is to run:
776+
```
777+
python -m fms_acceleration_moe.utils.checkpoint_utils \
778+
<checkpoint file> \
779+
<output file> \
780+
<original model>
781+
```
763782
764783
Note: To pass the above flags via a JSON config, each of the flags expects the value to be a mixed type list, so the values must be a list. For example:
765784
```json

pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -45,7 +45,7 @@ dev = ["wheel>=0.42.0,<1.0", "packaging>=23.2,<25", "ninja>=1.11.1.1,<2.0", "sci
4545
flash-attn = ["flash-attn>=2.5.3,<3.0"]
4646
aim = ["aim>=3.19.0,<4.0"]
4747
mlflow = ["mlflow"]
48-
fms-accel = ["fms-acceleration>=0.1"]
48+
fms-accel = ["fms-acceleration>=0.6"]
4949
gptq-dev = ["auto_gptq>0.4.2", "optimum>=1.15.0"]
5050

5151

tests/acceleration/test_acceleration_dataclasses.py

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,7 @@
2828
MultiPack,
2929
PaddingFree,
3030
)
31+
from tuning.config.acceleration_configs.fast_moe import FastMoe, FastMoeConfig
3132
from tuning.config.acceleration_configs.fused_ops_and_kernels import (
3233
FastKernelsConfig,
3334
FusedLoraConfig,
@@ -88,6 +89,13 @@ def test_dataclass_parse_successfully():
8889
)
8990
assert isinstance(cfg.multipack, MultiPack)
9091

92+
# 5. Specifing "--fast_moe" will parse an FastMoe class
93+
parser = transformers.HfArgumentParser(dataclass_types=FastMoeConfig)
94+
(cfg,) = parser.parse_args_into_dataclasses(
95+
["--fast_moe", "1"],
96+
)
97+
assert isinstance(cfg.fast_moe, FastMoe)
98+
9199

92100
def test_two_dataclasses_parse_successfully_together():
93101
"""Ensure that the two dataclasses can parse arguments successfully

tests/acceleration/test_acceleration_framework.py

Lines changed: 157 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -43,6 +43,7 @@
4343
MultiPack,
4444
PaddingFree,
4545
)
46+
from tuning.config.acceleration_configs.fast_moe import FastMoe, FastMoeConfig
4647
from tuning.config.acceleration_configs.fused_ops_and_kernels import (
4748
FastKernelsConfig,
4849
FusedLoraConfig,
@@ -56,7 +57,8 @@
5657
# for some reason the CI will raise an import error if we try to import
5758
# these from tests.artifacts.testdata
5859
TWITTER_COMPLAINTS_JSON_FORMAT = os.path.join(
59-
os.path.dirname(__file__), "../artifacts/testdata/twitter_complaints_json.json"
60+
os.path.dirname(__file__),
61+
"../artifacts/testdata/json/twitter_complaints_small.json",
6062
)
6163
TWITTER_COMPLAINTS_TOKENIZED = os.path.join(
6264
os.path.dirname(__file__),
@@ -87,6 +89,10 @@
8789
# Third Party
8890
from fms_acceleration_aadp import PaddingFreeAccelerationPlugin
8991

92+
if is_fms_accelerate_available(plugins="moe"):
93+
# Third Party
94+
from fms_acceleration_moe import ScatterMoEAccelerationPlugin
95+
9096

9197
# There are more extensive unit tests in the
9298
# https://github.com/foundation-model-stack/fms-acceleration
@@ -360,7 +366,7 @@ def test_framework_raises_due_to_invalid_arguments(
360366
acceleration_configs_map,
361367
ids=["bitsandbytes", "auto_gptq"],
362368
)
363-
def test_framework_intialized_properly_peft(
369+
def test_framework_initialized_properly_peft(
364370
quantized_lora_config, model_name_or_path, mock_and_spy
365371
):
366372
"""Ensure that specifying a properly configured acceleration dataclass
@@ -412,7 +418,7 @@ def test_framework_intialized_properly_peft(
412418
"and foak plugins"
413419
),
414420
)
415-
def test_framework_intialized_properly_foak():
421+
def test_framework_initialized_properly_foak():
416422
"""Ensure that specifying a properly configured acceleration dataclass
417423
properly activates the framework plugin and runs the train sucessfully.
418424
"""
@@ -477,6 +483,60 @@ def test_framework_intialized_properly_foak():
477483
assert spy2["get_ready_for_train_calls"] == 1
478484

479485

486+
@pytest.mark.skipif(
487+
not is_fms_accelerate_available(plugins="moe"),
488+
reason="Only runs if fms-accelerate is installed along with accelerated-moe plugin",
489+
)
490+
def test_framework_initialized_properly_moe():
491+
"""Ensure that specifying a properly configured acceleration dataclass
492+
properly activates the framework plugin and runs the train sucessfully.
493+
"""
494+
495+
with tempfile.TemporaryDirectory() as tempdir:
496+
497+
model_args = copy.deepcopy(MODEL_ARGS)
498+
model_args.model_name_or_path = "Isotonic/TinyMixtral-4x248M-MoE"
499+
model_args.torch_dtype = torch.bfloat16
500+
train_args = copy.deepcopy(TRAIN_ARGS)
501+
train_args.output_dir = tempdir
502+
train_args.save_strategy = "no"
503+
train_args.bf16 = True
504+
data_args = copy.deepcopy(DATA_ARGS)
505+
data_args.training_data_path = TWITTER_COMPLAINTS_JSON_FORMAT
506+
data_args.response_template = "\n\n### Label:"
507+
data_args.dataset_text_field = "output"
508+
509+
# initialize a config
510+
moe_config = FastMoeConfig(fast_moe=FastMoe(ep_degree=1))
511+
512+
# create mocked plugin class for spying
513+
MockedPlugin1, spy = create_mock_plugin_class_and_spy(
514+
"FastMoeMock", ScatterMoEAccelerationPlugin
515+
)
516+
517+
# 1. mock a plugin class
518+
# 2. register the mocked plugins
519+
# 3. call sft_trainer.train
520+
with build_framework_and_maybe_instantiate(
521+
[
522+
(["training.moe.scattermoe"], MockedPlugin1),
523+
],
524+
instantiate=False,
525+
):
526+
with instantiate_model_patcher():
527+
sft_trainer.train(
528+
model_args,
529+
data_args,
530+
train_args,
531+
fast_moe_config=moe_config,
532+
)
533+
534+
# spy inside the train to ensure that the ilab plugin is called
535+
assert spy["model_loader_calls"] == 1
536+
assert spy["augmentation_calls"] == 0
537+
assert spy["get_ready_for_train_calls"] == 1
538+
539+
480540
@pytest.mark.skipif(
481541
not is_fms_accelerate_available(plugins="aadp"),
482542
reason="Only runs if fms-accelerate is installed along with \
@@ -661,6 +721,100 @@ def test_error_raised_with_fused_lora_enabled_without_quantized_argument():
661721
)
662722

663723

724+
@pytest.mark.skipif(
725+
not is_fms_accelerate_available(plugins="moe"),
726+
reason="Only runs if fms-accelerate is installed along with accelerated-moe plugin",
727+
)
728+
def test_error_raised_with_undividable_fastmoe_argument():
729+
"""
730+
Ensure error is thrown when `--fast_moe` is passed and world_size
731+
is not divisible by ep_degree
732+
"""
733+
with pytest.raises(
734+
AssertionError, match="world size \\(1\\) not divisible by ep_size \\(3\\)"
735+
):
736+
with tempfile.TemporaryDirectory() as tempdir:
737+
738+
model_args = copy.deepcopy(MODEL_ARGS)
739+
model_args.model_name_or_path = "Isotonic/TinyMixtral-4x248M-MoE"
740+
model_args.torch_dtype = torch.bfloat16
741+
train_args = copy.deepcopy(TRAIN_ARGS)
742+
train_args.output_dir = tempdir
743+
train_args.save_strategy = "no"
744+
train_args.bf16 = True
745+
data_args = copy.deepcopy(DATA_ARGS)
746+
data_args.training_data_path = TWITTER_COMPLAINTS_JSON_FORMAT
747+
data_args.response_template = "\n\n### Label:"
748+
data_args.dataset_text_field = "output"
749+
750+
# initialize a config
751+
moe_config = FastMoeConfig(fast_moe=FastMoe(ep_degree=3))
752+
753+
# 1. mock a plugin class
754+
# 2. register the mocked plugins
755+
# 3. call sft_trainer.train
756+
with build_framework_and_maybe_instantiate(
757+
[
758+
(["training.moe.scattermoe"], ScatterMoEAccelerationPlugin),
759+
],
760+
instantiate=False,
761+
):
762+
with instantiate_model_patcher():
763+
sft_trainer.train(
764+
model_args,
765+
data_args,
766+
train_args,
767+
fast_moe_config=moe_config,
768+
)
769+
770+
771+
@pytest.mark.skipif(
772+
not is_fms_accelerate_available(plugins="moe"),
773+
reason="Only runs if fms-accelerate is installed along with accelerated-moe plugin",
774+
)
775+
def test_error_raised_fast_moe_with_non_moe_model():
776+
"""
777+
Ensure error is thrown when `--fast_moe` is passed and model is not MoE
778+
"""
779+
with pytest.raises(
780+
AttributeError,
781+
match="'LlamaConfig' object has no attribute 'num_local_experts'",
782+
):
783+
with tempfile.TemporaryDirectory() as tempdir:
784+
785+
model_args = copy.deepcopy(MODEL_ARGS)
786+
model_args.model_name_or_path = "TinyLlama/TinyLlama-1.1B-Chat-v0.3"
787+
model_args.torch_dtype = torch.bfloat16
788+
train_args = copy.deepcopy(TRAIN_ARGS)
789+
train_args.output_dir = tempdir
790+
train_args.save_strategy = "no"
791+
train_args.bf16 = True
792+
data_args = copy.deepcopy(DATA_ARGS)
793+
data_args.training_data_path = TWITTER_COMPLAINTS_JSON_FORMAT
794+
data_args.response_template = "\n\n### Label:"
795+
data_args.dataset_text_field = "output"
796+
797+
# initialize a config
798+
moe_config = FastMoeConfig(fast_moe=FastMoe(ep_degree=1))
799+
800+
# 1. mock a plugin class
801+
# 2. register the mocked plugins
802+
# 3. call sft_trainer.train
803+
with build_framework_and_maybe_instantiate(
804+
[
805+
(["training.moe.scattermoe"], ScatterMoEAccelerationPlugin),
806+
],
807+
instantiate=False,
808+
):
809+
with instantiate_model_patcher():
810+
sft_trainer.train(
811+
model_args,
812+
data_args,
813+
train_args,
814+
fast_moe_config=moe_config,
815+
)
816+
817+
664818
@pytest.mark.skipif(
665819
not is_fms_accelerate_available(plugins="foak"),
666820
reason="Only runs if fms-accelerate is installed along with \

0 commit comments

Comments
 (0)