Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
feat: Support multimodal models and datasets in FATE-LLM
Description
This PR introduces support for multimodal tasks (e.g., Image-Text) by extending the dataset processing and model loading capabilities. It allows users to train or fine-tune multimodal models (like CLIP, LLaVA, etc.) within the FATE-LLM framework using the existing runners.
Key Changes
Added a new MultimodalDataset class.
Integrated AutoProcessor and PIL to handle image loading and processing.
The dataset now automatically processes image columns (converting paths to RGB images) and tokenizes text columns using the model's processor, returning the required pixel_values and input_ids.
Added a new HFAutoModel class.
Unlike the existing HFAutoModelForCausalLM which is restricted to causal language models, this new class uses AutoModel.from_pretrained. This is essential for loading multimodal models that do not fit the standard CausalLM architecture structure.
Motivation
To expand FATE-LLM's capabilities beyond text-only tasks and support the growing demand for federated learning on multimodal data.
How to Use
Users can now configure their jobs to use multimodal models as follows:
Model Config:
Dataset Config:
Checklist
Code follows the project's coding style.
New classes (MultimodalDataset, HFAutoModel
`) are implemented.
Tested with a sample multimodal task (e.g., image captioning).
G