Skip to content

Commit 6e3fa6d

Browse files
authored
support paligemma2 (#2735)
1 parent a6f820a commit 6e3fa6d

File tree

9 files changed

+65
-14
lines changed

9 files changed

+65
-14
lines changed

README.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -55,13 +55,13 @@ You can contact us and communicate with us by adding our group:
5555

5656

5757
## 📝 Introduction
58-
🍲 ms-swift is an official framework provided by the ModelScope community for fine-tuning and deploying large language models and multi-modal large models. It currently supports the training (pre-training, fine-tuning, human alignment), inference, evaluation, quantization, and deployment of over 400 large models and 100+ multi-modal large models. These large language models (LLMs) include models such as Qwen2.5, Llama3.2, GLM4, Internlm2.5, Yi1.5, Mistral, DeepSeek, Baichuan2, Gemma2, and TeleChat2. The multi-modal LLMs include models such as Qwen2-VL, Qwen2-Audio, Llama3.2-Vision, Llava, InternVL2.5, MiniCPM-V-2.6, GLM4v, Xcomposer2.5, Yi-VL, DeepSeek-VL2, Phi3.5-Vision, and GOT-OCR2.
58+
🍲 ms-swift is an official framework provided by the ModelScope community for fine-tuning and deploying large language models and multi-modal large models. It currently supports the training (pre-training, fine-tuning, human alignment), inference, evaluation, quantization, and deployment of 400+ large models and 150+ multi-modal large models. These large language models (LLMs) include models such as Qwen2.5, Llama3.3, GLM4, Internlm2.5, Yi1.5, Mistral, DeepSeek2.5, Baichuan2, Gemma2, and TeleChat2. The multi-modal LLMs include models such as Qwen2-VL, Qwen2-Audio, Llama3.2-Vision, Llava, InternVL2.5, MiniCPM-V-2.6, GLM4v, Xcomposer2.5, Yi-VL, DeepSeek-VL2, Phi3.5-Vision, and GOT-OCR2.
5959

60-
🍔 In addition, ms-swift gathers the latest training technologies, including LoRA, QLoRA, Llama-Pro, LongLoRA, GaLore, Q-GaLore, LoRA+, LISA, DoRA, FourierFt, ReFT, UnSloth, and Liger. ms-swift supports accelerating the inference, evaluation, and deployment modules using vLLM and LMDeploy. To help researchers and developers fine-tune and apply large models more easily, ms-swift also provides a Gradio-based Web-UI interface and a wealth of best practices.
60+
🍔 In addition, ms-swift gathers the latest training technologies, including LoRA, QLoRA, Llama-Pro, LongLoRA, GaLore, Q-GaLore, LoRA+, LISA, DoRA, FourierFt, ReFT, UnSloth, and Liger. ms-swift supports acceleration of inference, evaluation, and deployment modules using vLLM and LMDeploy, and supports the quantization of large models and multi-modal large models using technologies such as GPTQ, AWQ, and BNB. To help researchers and developers fine-tune and apply large models more easily, ms-swift also provides a Gradio-based Web-UI interface and a wealth of best practices.
6161

6262
**Why choose ms-swift?**
6363

64-
- 🍎 **Model Types**: Supports 400+ large language models and **100+ multi-modal large models** and all-to-all models, **providing a comprehensive solution from training to deployment**.
64+
- 🍎 **Model Types**: Supports 400+ large language models and **150+ multi-modal large models** and all-to-all models, **providing a comprehensive solution from training to deployment**.
6565
- **Dataset Types**: Comes with 150+ pre-training, fine-tuning, human alignment, multi-modal datasets, and supports custom datasets.
6666
- **Hardware Support**: Compatible with CPU, RTX series, T4/V100, A10/A100/H100, Ascend NPU, etc.
6767
- 🍊 **Lightweight Training**: Supports lightweight fine-tuning methods like LoRA, QLoRA, DoRA, LoRA+, ReFT, RS-LoRA, LLaMAPro, Adapter, GaLore, Q-Galore, LISA, UnSloth, Liger-Kernel.

README_CN.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -53,12 +53,12 @@
5353
<img src="asset/discord_qr.jpg" width="200" height="200"> | <img src="asset/wechat.png" width="200" height="200">
5454

5555
## 📝 简介
56-
🍲 ms-swift是魔搭社区提供的大模型与多模态大模型微调部署框架,现已支持400+大模型与100+多模态大模型的训练(预训练、微调、人类对齐)、推理、评测、量化与部署。其中LLM包括:Qwen2.5、Llama3.2、GLM4、Internlm2.5、Yi1.5、Mistral、DeepSeek、Baichuan2、Gemma2、TeleChat2等模型,多模态LLM包括:Qwen2-VL、Qwen2-Audio、Llama3.2-Vision、Llava、InternVL2.5、MiniCPM-V-2.6、GLM4v、Xcomposer2.5、Yi-VL、DeepSeek-VL2、Phi3.5-Vision、GOT-OCR2等模型。
56+
🍲 ms-swift是魔搭社区提供的大模型与多模态大模型微调部署框架,现已支持450+大模型与150+多模态大模型的训练(预训练、微调、人类对齐)、推理、评测、量化与部署。其中大模型包括:Qwen2.5、Llama3.3、GLM4、Internlm2.5、Yi1.5、Mistral、DeepSeek2.5、Baichuan2、Gemma2、TeleChat2等模型,多模态大模型包括:Qwen2-VL、Qwen2-Audio、Llama3.2-Vision、Llava、InternVL2.5、MiniCPM-V-2.6、GLM4v、Xcomposer2.5、Yi-VL、DeepSeek-VL2、Phi3.5-Vision、GOT-OCR2等模型。
5757

58-
🍔 除此之外,ms-swift汇集了最新的训练技术,包括LoRA、QLoRA、Llama-Pro、LongLoRA、GaLore、Q-GaLore、LoRA+、LISA、DoRA、FourierFt、ReFT、UnSloth、和Liger等。ms-swift支持使用vLLM和LMDeploy对推理、评测和部署模块进行加速。为了帮助研究者和开发者更轻松地微调和应用大模型,ms-swift还提供了基于Gradio的Web-UI界面及丰富的最佳实践。
58+
🍔 除此之外,ms-swift汇集了最新的训练技术,包括LoRA、QLoRA、Llama-Pro、LongLoRA、GaLore、Q-GaLore、LoRA+、LISA、DoRA、FourierFt、ReFT、UnSloth、和Liger等。ms-swift支持使用vLLM和LMDeploy对推理、评测和部署模块进行加速,并支持使用GPTQ、AWQ、BNB等技术对大模型和多模态大模型进行量化。为了帮助研究者和开发者更轻松地微调和应用大模型,ms-swift还提供了基于Gradio的Web-UI界面及丰富的最佳实践。
5959

6060
**为什么选择ms-swift?**
61-
- 🍎 **模型类型**:支持400+纯文本大模型、**100+多模态大模型**,All-to-All全模态模型的**训练到部署全流程**
61+
- 🍎 **模型类型**:支持400+纯文本大模型、**150+多模态大模型**,All-to-All全模态模型的**训练到部署全流程**
6262
- **数据集类型**:内置150+预训练、微调、人类对齐、多模态等各种类型的数据集,并支持自定义数据集。
6363
- **硬件支持**:CPU、RTX系列、T4/V100、A10/A100/H100、Ascend NPU等。
6464
- 🍊 **轻量训练**:支持了LoRA、QLoRA、DoRA、LoRA+、ReFT、RS-LoRA、LLaMAPro、Adapter、GaLore、Q-Galore、LISA、UnSloth、Liger-Kernel等轻量微调方式。

docs/source/GetStarted/快速开始.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
11
# 快速开始
22

3-
ms-swift是魔搭社区提供的大模型与多模态大模型训练部署框架,现已支持400+大模型与100+多模态大模型的训练(预训练、微调、人类对齐)、推理、评测、量化与部署。模型开发者可以在ms-swift框架中一站式完成围绕大模型的各类需求。目前ms-swift的主要能力包含:
3+
ms-swift是魔搭社区提供的大模型与多模态大模型训练部署框架,现已支持400+大模型与150+多模态大模型的训练(预训练、微调、人类对齐)、推理、评测、量化与部署。模型开发者可以在ms-swift框架中一站式完成围绕大模型的各类需求。目前ms-swift的主要能力包含:
44

5-
- 🍎 模型类型:支持400+纯文本大模型、100+多模态大模型,All-to-All全模态模型的训练到部署全流程。
5+
- 🍎 模型类型:支持400+纯文本大模型、150+多模态大模型,All-to-All全模态模型的训练到部署全流程。
66
- 数据集类型:内置150+预训练、微调、人类对齐、多模态等各种类型的数据集,并支持自定义数据集。
77
- 硬件支持:CPU、RTX系列、T4/V100、A10/A100/H100、Ascend NPU等。
88
- 🍊 轻量训练:支持了LoRA、QLoRA、DoRA、LoRA+、ReFT、RS-LoRA、LLaMAPro、Adapter、GaLore、Q-Galore、LISA、UnSloth、Liger-Kernel等轻量微调方式。

docs/source/Instruction/支持的模型和数据集.md

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -603,6 +603,17 @@
603603
|[AI-ModelScope/paligemma-3b-pt-896](https://modelscope.cn/models/AI-ModelScope/paligemma-3b-pt-896)|paligemma|paligemma|transformers>=4.41|vision|[google/paligemma-3b-pt-896](https://huggingface.co/google/paligemma-3b-pt-896)|
604604
|[AI-ModelScope/paligemma-3b-mix-224](https://modelscope.cn/models/AI-ModelScope/paligemma-3b-mix-224)|paligemma|paligemma|transformers>=4.41|vision|[google/paligemma-3b-mix-224](https://huggingface.co/google/paligemma-3b-mix-224)|
605605
|[AI-ModelScope/paligemma-3b-mix-448](https://modelscope.cn/models/AI-ModelScope/paligemma-3b-mix-448)|paligemma|paligemma|transformers>=4.41|vision|[google/paligemma-3b-mix-448](https://huggingface.co/google/paligemma-3b-mix-448)|
606+
|[AI-ModelScope/paligemma2-3b-pt-224](https://modelscope.cn/models/AI-ModelScope/paligemma2-3b-pt-224)|paligemma|paligemma|transformers>=4.41|vision|[google/paligemma2-3b-pt-224](https://huggingface.co/google/paligemma2-3b-pt-224)|
607+
|[AI-ModelScope/paligemma2-3b-pt-448](https://modelscope.cn/models/AI-ModelScope/paligemma2-3b-pt-448)|paligemma|paligemma|transformers>=4.41|vision|[google/paligemma2-3b-pt-448](https://huggingface.co/google/paligemma2-3b-pt-448)|
608+
|[AI-ModelScope/paligemma2-3b-pt-896](https://modelscope.cn/models/AI-ModelScope/paligemma2-3b-pt-896)|paligemma|paligemma|transformers>=4.41|vision|[google/paligemma2-3b-pt-896](https://huggingface.co/google/paligemma2-3b-pt-896)|
609+
|[AI-ModelScope/paligemma2-10b-pt-224](https://modelscope.cn/models/AI-ModelScope/paligemma2-10b-pt-224)|paligemma|paligemma|transformers>=4.41|vision|[google/paligemma2-10b-pt-224](https://huggingface.co/google/paligemma2-10b-pt-224)|
610+
|[AI-ModelScope/paligemma2-10b-pt-448](https://modelscope.cn/models/AI-ModelScope/paligemma2-10b-pt-448)|paligemma|paligemma|transformers>=4.41|vision|[google/paligemma2-10b-pt-448](https://huggingface.co/google/paligemma2-10b-pt-448)|
611+
|[AI-ModelScope/paligemma2-10b-pt-896](https://modelscope.cn/models/AI-ModelScope/paligemma2-10b-pt-896)|paligemma|paligemma|transformers>=4.41|vision|[google/paligemma2-10b-pt-896](https://huggingface.co/google/paligemma2-10b-pt-896)|
612+
|[AI-ModelScope/paligemma2-28b-pt-224](https://modelscope.cn/models/AI-ModelScope/paligemma2-28b-pt-224)|paligemma|paligemma|transformers>=4.41|vision|[google/paligemma2-28b-pt-224](https://huggingface.co/google/paligemma2-28b-pt-224)|
613+
|[AI-ModelScope/paligemma2-28b-pt-448](https://modelscope.cn/models/AI-ModelScope/paligemma2-28b-pt-448)|paligemma|paligemma|transformers>=4.41|vision|[google/paligemma2-28b-pt-448](https://huggingface.co/google/paligemma2-28b-pt-448)|
614+
|[AI-ModelScope/paligemma2-28b-pt-896](https://modelscope.cn/models/AI-ModelScope/paligemma2-28b-pt-896)|paligemma|paligemma|transformers>=4.41|vision|[google/paligemma2-28b-pt-896](https://huggingface.co/google/paligemma2-28b-pt-896)|
615+
|[AI-ModelScope/paligemma2-3b-ft-docci-448](https://modelscope.cn/models/AI-ModelScope/paligemma2-3b-ft-docci-448)|paligemma|paligemma|transformers>=4.41|vision|[google/paligemma2-3b-ft-docci-448](https://huggingface.co/google/paligemma2-3b-ft-docci-448)|
616+
|[AI-ModelScope/paligemma2-10b-ft-docci-448](https://modelscope.cn/models/AI-ModelScope/paligemma2-10b-ft-docci-448)|paligemma|paligemma|transformers>=4.41|vision|[google/paligemma2-10b-ft-docci-448](https://huggingface.co/google/paligemma2-10b-ft-docci-448)|
606617
|[LLM-Research/Molmo-7B-O-0924](https://modelscope.cn/models/LLM-Research/Molmo-7B-O-0924)|molmo|molmo|transformers>=4.45|vision|[allenai/Molmo-7B-O-0924](https://huggingface.co/allenai/Molmo-7B-O-0924)|
607618
|[LLM-Research/Molmo-7B-D-0924](https://modelscope.cn/models/LLM-Research/Molmo-7B-D-0924)|molmo|molmo|transformers>=4.45|vision|[allenai/Molmo-7B-D-0924](https://huggingface.co/allenai/Molmo-7B-D-0924)|
608619
|[LLM-Research/Molmo-72B-0924](https://modelscope.cn/models/LLM-Research/Molmo-72B-0924)|molmo|molmo|transformers>=4.45|vision|[allenai/Molmo-72B-0924](https://huggingface.co/allenai/Molmo-72B-0924)|

docs/source_en/GetStarted/Quick-start.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
11
# Quick Start
22

3-
ms-swift is a comprehensive training and deployment framework for large language models and multimodal large models, provided by the ModelScope Community. It currently supports the training (CPT, SFT, RLHF), inference, evaluation, quantization, and deployment of over 400 LLM and over 100 MLLM. Model developers can fulfill all kinds of needs related to large models in a single platform within the ms-swift framework. The main capabilities of ms-swift include:
3+
ms-swift is a comprehensive training and deployment framework for large language models and multimodal large models, provided by the ModelScope Community. It currently supports the training (CPT, SFT, RLHF), inference, evaluation, quantization, and deployment of 400+ LLM and 150+ MLLM. Model developers can fulfill all kinds of needs related to large models in a single platform within the ms-swift framework. The main capabilities of ms-swift include:
44

5-
- 🍎 Model Types: Supports the full process from training to deployment of over 400 text-based large models and over 100 multimodal large models, including All-to-All all-modality models.
5+
- 🍎 Model Types: Supports the full process from training to deployment of 400+ text-based large models and 150+ multimodal large models, including All-to-All all-modality models.
66
- Dataset Types: Comes with more than 150 pre-built datasets for pre-training, fine-tuning, human alignment, multimodal, and supports custom datasets.
77
- Hardware Support: Compatible with CPU, RTX series, T4/V100, A10/A100/H100, Ascend NPU, and others.
88
- 🍊 Lightweight Training: Supports lightweight fine-tuning methods like LoRA, QLoRA, DoRA, LoRA+, ReFT, RS-LoRA, LLaMAPro, Adapter, GaLore, Q-Galore, LISA, UnSloth, Liger-Kernel, and more.

docs/source_en/Instruction/Supported-models-and-datasets.md

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -603,6 +603,17 @@ The table below introduces the models integrated with ms-swift:
603603
|[AI-ModelScope/paligemma-3b-pt-896](https://modelscope.cn/models/AI-ModelScope/paligemma-3b-pt-896)|paligemma|paligemma|transformers>=4.41|vision|[google/paligemma-3b-pt-896](https://huggingface.co/google/paligemma-3b-pt-896)|
604604
|[AI-ModelScope/paligemma-3b-mix-224](https://modelscope.cn/models/AI-ModelScope/paligemma-3b-mix-224)|paligemma|paligemma|transformers>=4.41|vision|[google/paligemma-3b-mix-224](https://huggingface.co/google/paligemma-3b-mix-224)|
605605
|[AI-ModelScope/paligemma-3b-mix-448](https://modelscope.cn/models/AI-ModelScope/paligemma-3b-mix-448)|paligemma|paligemma|transformers>=4.41|vision|[google/paligemma-3b-mix-448](https://huggingface.co/google/paligemma-3b-mix-448)|
606+
|[AI-ModelScope/paligemma2-3b-pt-224](https://modelscope.cn/models/AI-ModelScope/paligemma2-3b-pt-224)|paligemma|paligemma|transformers>=4.41|vision|[google/paligemma2-3b-pt-224](https://huggingface.co/google/paligemma2-3b-pt-224)|
607+
|[AI-ModelScope/paligemma2-3b-pt-448](https://modelscope.cn/models/AI-ModelScope/paligemma2-3b-pt-448)|paligemma|paligemma|transformers>=4.41|vision|[google/paligemma2-3b-pt-448](https://huggingface.co/google/paligemma2-3b-pt-448)|
608+
|[AI-ModelScope/paligemma2-3b-pt-896](https://modelscope.cn/models/AI-ModelScope/paligemma2-3b-pt-896)|paligemma|paligemma|transformers>=4.41|vision|[google/paligemma2-3b-pt-896](https://huggingface.co/google/paligemma2-3b-pt-896)|
609+
|[AI-ModelScope/paligemma2-10b-pt-224](https://modelscope.cn/models/AI-ModelScope/paligemma2-10b-pt-224)|paligemma|paligemma|transformers>=4.41|vision|[google/paligemma2-10b-pt-224](https://huggingface.co/google/paligemma2-10b-pt-224)|
610+
|[AI-ModelScope/paligemma2-10b-pt-448](https://modelscope.cn/models/AI-ModelScope/paligemma2-10b-pt-448)|paligemma|paligemma|transformers>=4.41|vision|[google/paligemma2-10b-pt-448](https://huggingface.co/google/paligemma2-10b-pt-448)|
611+
|[AI-ModelScope/paligemma2-10b-pt-896](https://modelscope.cn/models/AI-ModelScope/paligemma2-10b-pt-896)|paligemma|paligemma|transformers>=4.41|vision|[google/paligemma2-10b-pt-896](https://huggingface.co/google/paligemma2-10b-pt-896)|
612+
|[AI-ModelScope/paligemma2-28b-pt-224](https://modelscope.cn/models/AI-ModelScope/paligemma2-28b-pt-224)|paligemma|paligemma|transformers>=4.41|vision|[google/paligemma2-28b-pt-224](https://huggingface.co/google/paligemma2-28b-pt-224)|
613+
|[AI-ModelScope/paligemma2-28b-pt-448](https://modelscope.cn/models/AI-ModelScope/paligemma2-28b-pt-448)|paligemma|paligemma|transformers>=4.41|vision|[google/paligemma2-28b-pt-448](https://huggingface.co/google/paligemma2-28b-pt-448)|
614+
|[AI-ModelScope/paligemma2-28b-pt-896](https://modelscope.cn/models/AI-ModelScope/paligemma2-28b-pt-896)|paligemma|paligemma|transformers>=4.41|vision|[google/paligemma2-28b-pt-896](https://huggingface.co/google/paligemma2-28b-pt-896)|
615+
|[AI-ModelScope/paligemma2-3b-ft-docci-448](https://modelscope.cn/models/AI-ModelScope/paligemma2-3b-ft-docci-448)|paligemma|paligemma|transformers>=4.41|vision|[google/paligemma2-3b-ft-docci-448](https://huggingface.co/google/paligemma2-3b-ft-docci-448)|
616+
|[AI-ModelScope/paligemma2-10b-ft-docci-448](https://modelscope.cn/models/AI-ModelScope/paligemma2-10b-ft-docci-448)|paligemma|paligemma|transformers>=4.41|vision|[google/paligemma2-10b-ft-docci-448](https://huggingface.co/google/paligemma2-10b-ft-docci-448)|
606617
|[LLM-Research/Molmo-7B-O-0924](https://modelscope.cn/models/LLM-Research/Molmo-7B-O-0924)|molmo|molmo|transformers>=4.45|vision|[allenai/Molmo-7B-O-0924](https://huggingface.co/allenai/Molmo-7B-O-0924)|
607618
|[LLM-Research/Molmo-7B-D-0924](https://modelscope.cn/models/LLM-Research/Molmo-7B-D-0924)|molmo|molmo|transformers>=4.45|vision|[allenai/Molmo-7B-D-0924](https://huggingface.co/allenai/Molmo-7B-D-0924)|
608619
|[LLM-Research/Molmo-72B-0924](https://modelscope.cn/models/LLM-Research/Molmo-72B-0924)|molmo|molmo|transformers>=4.45|vision|[allenai/Molmo-72B-0924](https://huggingface.co/allenai/Molmo-72B-0924)|

swift/llm/model/model/gemma.py

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -28,9 +28,26 @@ def get_model_tokenizer_paligemma_vision(model_dir: str,
2828
Model('AI-ModelScope/paligemma-3b-pt-224', 'google/paligemma-3b-pt-224'),
2929
Model('AI-ModelScope/paligemma-3b-pt-448', 'google/paligemma-3b-pt-448'),
3030
Model('AI-ModelScope/paligemma-3b-pt-896', 'google/paligemma-3b-pt-896'),
31+
]),
32+
ModelGroup([
3133
Model('AI-ModelScope/paligemma-3b-mix-224', 'google/paligemma-3b-mix-224'),
3234
Model('AI-ModelScope/paligemma-3b-mix-448', 'google/paligemma-3b-mix-448'),
3335
]),
36+
ModelGroup([
37+
Model('AI-ModelScope/paligemma2-3b-pt-224', 'google/paligemma2-3b-pt-224'),
38+
Model('AI-ModelScope/paligemma2-3b-pt-448', 'google/paligemma2-3b-pt-448'),
39+
Model('AI-ModelScope/paligemma2-3b-pt-896', 'google/paligemma2-3b-pt-896'),
40+
Model('AI-ModelScope/paligemma2-10b-pt-224', 'google/paligemma2-10b-pt-224'),
41+
Model('AI-ModelScope/paligemma2-10b-pt-448', 'google/paligemma2-10b-pt-448'),
42+
Model('AI-ModelScope/paligemma2-10b-pt-896', 'google/paligemma2-10b-pt-896'),
43+
Model('AI-ModelScope/paligemma2-28b-pt-224', 'google/paligemma2-28b-pt-224'),
44+
Model('AI-ModelScope/paligemma2-28b-pt-448', 'google/paligemma2-28b-pt-448'),
45+
Model('AI-ModelScope/paligemma2-28b-pt-896', 'google/paligemma2-28b-pt-896'),
46+
]),
47+
ModelGroup([
48+
Model('AI-ModelScope/paligemma2-3b-ft-docci-448', 'google/paligemma2-3b-ft-docci-448'),
49+
Model('AI-ModelScope/paligemma2-10b-ft-docci-448', 'google/paligemma2-10b-ft-docci-448'),
50+
]),
3451
],
3552
TemplateType.paligemma,
3653
get_model_tokenizer_paligemma_vision,

swift/llm/template/template/gemma.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -42,7 +42,7 @@ def _encode(self, inputs: StdTemplateInputs) -> Dict[str, Any]:
4242
encoded['token_type_ids'] = [0] * len(encoded['input_ids'])
4343
if raw_image:
4444
model_inputs = processor(text=inputs.to_history()['query'], images=raw_image[0], return_tensors='pt')
45-
encoded['pixel_values'] = model_inputs['pixel_values']
45+
encoded['pixel_values'] = model_inputs['pixel_values'].to(self.config.torch_dtype)
4646
return encoded
4747

4848

0 commit comments

Comments
 (0)