Skip to content

Commit 9d5cbba

Browse files
Merge pull request #2559 from AI-Hypercomputer:hengtaoguo-doc
PiperOrigin-RevId: 825683633
2 parents af0560c + 622c0e6 commit 9d5cbba

File tree

1 file changed

+15
-12
lines changed

1 file changed

+15
-12
lines changed

docs/guides/multimodal.md

Lines changed: 15 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ This document provides a guide to use the multimodal functionalities in MaxText
77
- **Multimodal Decode**: Inference with text+images as input.
88
- **Supervised Fine-Tuning (SFT)**: Apply SFT to the model using a visual-question-answering dataset.
99

10-
The following table provides a list of models and modalities we currently support:
10+
We also provide a [colab](https://github.com/AI-Hypercomputer/maxtext/blob/main/src/MaxText/examples/multimodal_gemma3_demo.ipynb) for multimodal features demonstration. The following table provides a list of models and modalities we currently support:
1111
| Models | Input Modalities | Output Modalities |
1212
| :---- | :---- | :---- |
1313
| - Gemma3-4B/12B/27B<br>- Llama4-Scout/Maverick | Text, images | Text |
@@ -113,22 +113,25 @@ Here, we use [ChartQA](https://huggingface.co/datasets/HuggingFaceM4/ChartQA) as
113113

114114

115115
```shell
116-
python -m MaxText.sft_trainer MaxText/configs/sft-vision-chartqa.yml \
117-
run_name=$idx \
116+
python -m MaxText.sft_trainer \
117+
$MAXTEXT_REPO_ROOT/configs/sft-vision-chartqa.yml \
118+
run_name="chartqa-sft" \
118119
model_name=gemma3-4b \
119-
tokenizer_path="google/gemma-3-4b-pt" \
120+
tokenizer_path="google/gemma-3-4b-it" \
121+
hf_access_token=$HF_ACCESS_TOKEN \
122+
load_parameters_path=$UNSCANNED_CKPT_PATH \
123+
base_output_directory=$BASE_OUTPUT_DIRECTORY \
120124
per_device_batch_size=1 \
125+
steps=$STEPS \
121126
max_prefill_predict_length=1024 \
122127
max_target_length=2048 \
123-
steps=200 \
124-
scan_layers=false \
125-
async_checkpointing=False \
128+
checkpoint_period=1000 \
129+
scan_layers=False \
130+
async_checkpointing=True \
131+
enable_checkpointing=True \
126132
attention=dot_product \
127-
dataset_type=hf hf_path=parquet hf_access_token=$HF_ACCESS_TOKEN \
128-
hf_train_files=gs://aireenmei-multipod/dataset/hf/chartqa/train-* \
129-
base_output_directory=$BASE_OUTPUT_DIRECTORY \
130-
load_parameters_path=$UNSCANNED_CKPT_PATH \
131-
dtype=bfloat16 weight_dtype=bfloat16 sharding_tolerance=0.05
133+
max_num_images_per_example=1 \
134+
dataset_type=hf profiler=xplane
132135
```
133136

134137
## Other Recommendations

0 commit comments

Comments
 (0)