多模态模型的联邦算子补充 #140

imxuebi · 2025-12-29T12:30:17Z

feat: Support multimodal models and datasets in FATE-LLM

Description
This PR introduces support for multimodal tasks (e.g., Image-Text) by extending the dataset processing and model loading capabilities. It allows users to train or fine-tune multimodal models (like CLIP, LLaVA, etc.) within the FATE-LLM framework using the existing runners.

Key Changes

Dataset Support (hf_dataset.py)
Added a new MultimodalDataset class.
Integrated AutoProcessor and PIL to handle image loading and processing.
The dataset now automatically processes image columns (converting paths to RGB images) and tokenizes text columns using the model's processor, returning the required pixel_values and input_ids.
Model Support (hf_model.py)
Added a new HFAutoModel class.
Unlike the existing HFAutoModelForCausalLM which is restricted to causal language models, this new class uses AutoModel.from_pretrained. This is essential for loading multimodal models that do not fit the standard CausalLM architecture structure.
Motivation
To expand FATE-LLM's capabilities beyond text-only tasks and support the growing demand for federated learning on multimodal data.

How to Use
Users can now configure their jobs to use multimodal models as follows:

Model Config:

Dataset Config:

Checklist
Code follows the project's coding style.
New classes (MultimodalDataset, HFAutoModel
`) are implemented.

Tested with a sample multimodal task (e.g., image captioning).
G

feat(dataset): add MultimodalDataset for image-text processing" -m "- Introduces MultimodalDataset class that uses AutoProcessor to handle image and text inputs for multimodal tasks.

feat(model): add HFAutoModel for generic model loading" -m "- Adds HFAutoModel class to support loading models using AutoModel.from_pretrained, enabling support for multimodal architectures.

imxuebi added 2 commits December 29, 2025 20:28

Update hf_dataset.py

247a779

feat(dataset): add MultimodalDataset for image-text processing" -m "- Introduces MultimodalDataset class that uses AutoProcessor to handle image and text inputs for multimodal tasks.

Update hf_model.py

eaeb41d

feat(model): add HFAutoModel for generic model loading" -m "- Adds HFAutoModel class to support loading models using AutoModel.from_pretrained, enabling support for multimodal architectures.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

多模态模型的联邦算子补充 #140

多模态模型的联邦算子补充 #140

Uh oh!

imxuebi commented Dec 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

多模态模型的联邦算子补充 #140

Are you sure you want to change the base?

多模态模型的联邦算子补充 #140

Uh oh!

Conversation

imxuebi commented Dec 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant