Skip to content

Conversation

@imxuebi
Copy link

@imxuebi imxuebi commented Dec 29, 2025

feat: Support multimodal models and datasets in FATE-LLM

Description
This PR introduces support for multimodal tasks (e.g., Image-Text) by extending the dataset processing and model loading capabilities. It allows users to train or fine-tune multimodal models (like CLIP, LLaVA, etc.) within the FATE-LLM framework using the existing runners.

Key Changes

  1. Dataset Support (hf_dataset.py)
    Added a new MultimodalDataset class.
    Integrated AutoProcessor and PIL to handle image loading and processing.
    The dataset now automatically processes image columns (converting paths to RGB images) and tokenizes text columns using the model's processor, returning the required pixel_values and input_ids.
  2. Model Support (hf_model.py)
    Added a new HFAutoModel class.
    Unlike the existing HFAutoModelForCausalLM which is restricted to causal language models, this new class uses AutoModel.from_pretrained. This is essential for loading multimodal models that do not fit the standard CausalLM architecture structure.
    Motivation
    To expand FATE-LLM's capabilities beyond text-only tasks and support the growing demand for federated learning on multimodal data.

How to Use
Users can now configure their jobs to use multimodal models as follows:

Model Config:

Dataset Config:

Checklist
Code follows the project's coding style.
New classes (MultimodalDataset, HFAutoModel
`) are implemented.

Tested with a sample multimodal task (e.g., image captioning).
G

feat(dataset): add MultimodalDataset for image-text processing" -m "- Introduces MultimodalDataset class that uses AutoProcessor to handle image and text inputs for multimodal tasks.
feat(model): add HFAutoModel for generic model loading" -m "- Adds HFAutoModel class to support loading models using AutoModel.from_pretrained, enabling support for multimodal architectures.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant