Skip to content

Commit e2bba6e

Browse files
committed
Merge branch 'main' into release/2.3
2 parents 42e476a + e73c5e2 commit e2bba6e

File tree

14 files changed

+87
-25
lines changed

14 files changed

+87
-25
lines changed

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -56,7 +56,7 @@ You can contact us and communicate with us by adding our group:
5656

5757
## 🎉 News
5858
- 🔥2024.08.22: Support `reft` tuner from [ReFT](https://github.com/stanfordnlp/pyreft) to achieve 15×–65× more parameter-efficient than LoRA, use `--sft_type reft` to begin!
59-
- 2024.08.21: Support for phi3_5-mini-instruct, phi3_5-moe-instruct, and phi3_5-vision-instruct.
59+
- 🔥2024.08.21: Support for phi3_5-mini-instruct, phi3_5-moe-instruct, and phi3_5-vision-instruct. The best practices for fine-tuning Latex OCR using phi3_5-vision-instruct can be found [here](https://github.com/modelscope/ms-swift/issues/1809).
6060
- 2024.08.21: Support for idefics3-8b-llama3, llava-onevision-qwen2-0_5b-ov, llava-onevision-qwen2-7b-ov, and llava-onevision-qwen2-72b-ov.
6161
- 🔥2024.08.20: Support fine-tuning of multimodal large models using DeepSpeed-Zero3.
6262
- 2024.08.20: Supported models: longwriter-glm4-9b, longwriter-llama3_1-8b. Supported dataset: longwriter-6k.

README_CN.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -57,7 +57,7 @@ SWIFT具有丰富全面的文档,请查看我们的文档网站:
5757

5858
## 🎉 新闻
5959
- 🔥2024.08.22: 支持[ReFT](https://github.com/stanfordnlp/pyreft), 该tuner可以以LoRA的1/15~1/65的参数量达到和LoRA匹配或更好的效果, 使用`--sft_type reft`开始训练!
60-
- 2024.08.21: 支持phi3_5-mini-instruct, phi3_5-moe-instruct, phi3_5-vision-instruct.
60+
- 🔥2024.08.21: 支持phi3_5-mini-instruct, phi3_5-moe-instruct, phi3_5-vision-instruct. 使用phi3_5-vision-instruct进行Latex OCR微调的最佳实践可以查看[这里](https://github.com/modelscope/ms-swift/issues/1809).
6161
- 2024.08.21: 支持idefics3-8b-llama3, llava-onevision-qwen2-0_5b-ov, llava-onevision-qwen2-7b-ov, llava-onevision-qwen2-72b-ov.
6262
- 🔥2024.08.20: 支持使用deepspeed-zero3对多模态大模型进行微调.
6363
- 2024.08.20: 支持模型: longwriter-glm4-9b, longwriter-llama3_1-8b. 支持数据集: longwriter-6k.

docs/source/LLM/index.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
## LLM文档
22

3-
[English Documentation](https://swift.readthedocs.io/en/latest/)
3+
[English Documentation](https://swift.readthedocs.io/en/latest/LLM/index.html)
44

55
### 📚教程
66

docs/source/LLM/支持的模型和数据集.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -510,9 +510,11 @@
510510
|coco-en-2|[modelscope/coco_2014_caption](https://modelscope.cn/datasets/modelscope/coco_2014_caption/summary)|coco_2014_caption|454617|36.8±2.8, min=32, max=89|chat, multi-modal, vision|-|
511511
|🔥coco-en-2-mini|[modelscope/coco_2014_caption](https://modelscope.cn/datasets/modelscope/coco_2014_caption/summary)|coco_2014_caption|40504|36.8±2.6, min=32, max=75|chat, multi-modal, vision|-|
512512
|capcha-images|[AI-ModelScope/captcha-images](https://modelscope.cn/datasets/AI-ModelScope/captcha-images/summary)||8000|31.0±0.0, min=31, max=31|chat, multi-modal, vision|-|
513+
|latex-ocr-print|[AI-ModelScope/LaTeX_OCR](https://modelscope.cn/datasets/AI-ModelScope/LaTeX_OCR/summary)|full|17918|362.7±34.8, min=294, max=528|chat, ocr, multi-modal, vision|[linxy/LaTeX_OCR](https://huggingface.co/datasets/linxy/LaTeX_OCR)|
514+
|latex-ocr-handwrite|[AI-ModelScope/LaTeX_OCR](https://modelscope.cn/datasets/AI-ModelScope/LaTeX_OCR/summary)|synthetic_handwrite|95424|375.1±59.4, min=292, max=2115|chat, ocr, multi-modal, vision|[linxy/LaTeX_OCR](https://huggingface.co/datasets/linxy/LaTeX_OCR)|
513515
|aishell1-zh|[speech_asr/speech_asr_aishell1_trainsets](https://modelscope.cn/datasets/speech_asr/speech_asr_aishell1_trainsets/summary)||141600|152.2±36.8, min=63, max=419|chat, multi-modal, audio|-|
514516
|🔥aishell1-zh-mini|[speech_asr/speech_asr_aishell1_trainsets](https://modelscope.cn/datasets/speech_asr/speech_asr_aishell1_trainsets/summary)||14526|152.2±35.6, min=74, max=359|chat, multi-modal, audio|-|
515-
|🔥video-chatgpt|[swift/VideoChatGPT](https://modelscope.cn/datasets/swift/VideoChatGPT/summary)|Generic<br>Temporal<br>Consistency|3206|88.4±48.3, min=32, max=399|chat, multi-modal, video|-|
517+
|🔥video-chatgpt|[swift/VideoChatGPT](https://modelscope.cn/datasets/swift/VideoChatGPT/summary)|Generic<br>Temporal<br>Consistency|3206|88.4±48.3, min=32, max=399|chat, multi-modal, video|[lmms-lab/VideoChatGPT](https://huggingface.co/datasets/lmms-lab/VideoChatGPT)|
516518
|hh-rlhf|[AI-ModelScope/hh-rlhf](https://modelscope.cn/datasets/AI-ModelScope/hh-rlhf/summary)|harmless-base<br>helpful-base<br>helpful-online<br>helpful-rejection-sampled|127459|245.4±190.7, min=22, max=1999|rlhf, dpo, pairwise|-|
517519
|🔥hh-rlhf-cn|[AI-ModelScope/hh_rlhf_cn](https://modelscope.cn/datasets/AI-ModelScope/hh_rlhf_cn/summary)|hh_rlhf<br>harmless_base_cn<br>harmless_base_en<br>helpful_base_cn<br>helpful_base_en|355920|171.2±122.7, min=22, max=3078|rlhf, dpo, pairwise|-|
518520
|orpo-dpo-mix-40k|[AI-ModelScope/orpo-dpo-mix-40k](https://modelscope.cn/datasets/AI-ModelScope/orpo-dpo-mix-40k/summary)|default|43666|548.3±397.4, min=28, max=8483|dpo, orpo, en, quality|[mlabonne/orpo-dpo-mix-40k](https://huggingface.co/datasets/mlabonne/orpo-dpo-mix-40k)|

docs/source/Multi-Modal/index.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@
1616
4. [InternVL系列最佳实践](internvl最佳实践.md)
1717
5. [Deepseek-VL最佳实践](deepseek-vl最佳实践.md)
1818
6. [Internlm2-Xcomposers最佳实践](internlm-xcomposer2最佳实践.md)
19-
7. [Phi3-Vision最佳实践](phi3-vision最佳实践.md)
19+
7. [Phi3-Vision最佳实践](phi3-vision最佳实践.md), [Phi3.5-Vision最佳实践](https://github.com/modelscope/ms-swift/issues/1809).
2020

2121

2222
一轮对话只能包含一张图片(可能可以不含图片):

docs/source_en/LLM/Supported-models-datasets.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -510,9 +510,11 @@ The table below introduces the datasets supported by SWIFT:
510510
|coco-en-2|[modelscope/coco_2014_caption](https://modelscope.cn/datasets/modelscope/coco_2014_caption/summary)|coco_2014_caption|454617|36.8±2.8, min=32, max=89|chat, multi-modal, vision|-|
511511
|🔥coco-en-2-mini|[modelscope/coco_2014_caption](https://modelscope.cn/datasets/modelscope/coco_2014_caption/summary)|coco_2014_caption|40504|36.8±2.6, min=32, max=75|chat, multi-modal, vision|-|
512512
|capcha-images|[AI-ModelScope/captcha-images](https://modelscope.cn/datasets/AI-ModelScope/captcha-images/summary)||8000|31.0±0.0, min=31, max=31|chat, multi-modal, vision|-|
513+
|latex-ocr-print|[AI-ModelScope/LaTeX_OCR](https://modelscope.cn/datasets/AI-ModelScope/LaTeX_OCR/summary)|full|17918|362.7±34.8, min=294, max=528|chat, ocr, multi-modal, vision|[linxy/LaTeX_OCR](https://huggingface.co/datasets/linxy/LaTeX_OCR)|
514+
|latex-ocr-handwrite|[AI-ModelScope/LaTeX_OCR](https://modelscope.cn/datasets/AI-ModelScope/LaTeX_OCR/summary)|synthetic_handwrite|95424|375.1±59.4, min=292, max=2115|chat, ocr, multi-modal, vision|[linxy/LaTeX_OCR](https://huggingface.co/datasets/linxy/LaTeX_OCR)|
513515
|aishell1-zh|[speech_asr/speech_asr_aishell1_trainsets](https://modelscope.cn/datasets/speech_asr/speech_asr_aishell1_trainsets/summary)||141600|152.2±36.8, min=63, max=419|chat, multi-modal, audio|-|
514516
|🔥aishell1-zh-mini|[speech_asr/speech_asr_aishell1_trainsets](https://modelscope.cn/datasets/speech_asr/speech_asr_aishell1_trainsets/summary)||14526|152.2±35.6, min=74, max=359|chat, multi-modal, audio|-|
515-
|🔥video-chatgpt|[swift/VideoChatGPT](https://modelscope.cn/datasets/swift/VideoChatGPT/summary)|Generic<br>Temporal<br>Consistency|3206|88.4±48.3, min=32, max=399|chat, multi-modal, video|-|
517+
|🔥video-chatgpt|[swift/VideoChatGPT](https://modelscope.cn/datasets/swift/VideoChatGPT/summary)|Generic<br>Temporal<br>Consistency|3206|88.4±48.3, min=32, max=399|chat, multi-modal, video|[lmms-lab/VideoChatGPT](https://huggingface.co/datasets/lmms-lab/VideoChatGPT)|
516518
|hh-rlhf|[AI-ModelScope/hh-rlhf](https://modelscope.cn/datasets/AI-ModelScope/hh-rlhf/summary)|harmless-base<br>helpful-base<br>helpful-online<br>helpful-rejection-sampled|127459|245.4±190.7, min=22, max=1999|rlhf, dpo, pairwise|-|
517519
|🔥hh-rlhf-cn|[AI-ModelScope/hh_rlhf_cn](https://modelscope.cn/datasets/AI-ModelScope/hh_rlhf_cn/summary)|hh_rlhf<br>harmless_base_cn<br>harmless_base_en<br>helpful_base_cn<br>helpful_base_en|355920|171.2±122.7, min=22, max=3078|rlhf, dpo, pairwise|-|
518520
|orpo-dpo-mix-40k|[AI-ModelScope/orpo-dpo-mix-40k](https://modelscope.cn/datasets/AI-ModelScope/orpo-dpo-mix-40k/summary)|default|43666|548.3±397.4, min=28, max=8483|dpo, orpo, en, quality|[mlabonne/orpo-dpo-mix-40k](https://huggingface.co/datasets/mlabonne/orpo-dpo-mix-40k)|

docs/source_en/Multi-Modal/index.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ A single round of dialogue can contain multiple images (or no images):
1616
4. [InternVL Series Best Practice](internvl-best-practice.md)
1717
5. [Deepseek-VL Best Practice](deepseek-vl-best-practice.md)
1818
6. [Internlm2-Xcomposers Best Practice](internlm-xcomposer2-best-practice.md)
19-
7. [Phi3-Vision Best Practice](phi3-vision-best-practice.md)
19+
7. [Phi3-Vision Best Practice](phi3-vision-best-practice.md), [Phi3.5-Vision Best Practice](https://github.com/modelscope/ms-swift/issues/1809).
2020

2121

2222
A single round of dialogue can only contain one image:

swift/llm/export.py

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -287,7 +287,8 @@ def llm_export(args: ExportArguments) -> None:
287287
'Skipping the conversion process.')
288288
else:
289289
from swift.llm.megatron import MegatronArguments, convert_hf_to_megatron, patch_megatron
290-
model, tokenizer = get_model_tokenizer(args.model_type, torch.float32, {'device_map': 'auto'})
290+
model, tokenizer = get_model_tokenizer(
291+
args.model_type, torch.float32, {'device_map': 'auto'}, model_id_or_path=args.model_id_or_path)
291292
res = MegatronArguments.load_megatron_config(tokenizer.model_dir)
292293
res['model_type'] = args.model_type
293294
res['target_tensor_model_parallel_size'] = args.tp
@@ -311,7 +312,8 @@ def llm_export(args: ExportArguments) -> None:
311312
'Skipping the conversion process.')
312313
else:
313314
from swift.llm.megatron import MegatronArguments, convert_megatron_to_hf, patch_megatron
314-
hf_model, tokenizer = get_model_tokenizer(args.model_type, torch.float32, {'device_map': 'auto'})
315+
hf_model, tokenizer = get_model_tokenizer(
316+
args.model_type, torch.float32, {'device_map': 'auto'}, model_id_or_path=args.model_id_or_path)
315317
res = MegatronArguments.load_megatron_config(tokenizer.model_dir)
316318
res['model_type'] = args.model_type
317319
res['target_tensor_model_parallel_size'] = args.tp

swift/llm/utils/argument.py

Lines changed: 8 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -80,16 +80,14 @@ def _check_path(cls,
8080
value = res
8181
return value
8282

83-
@staticmethod
84-
def _is_multimodal(model_type: Optional[str] = None) -> bool:
83+
def _is_multimodal(self, model_type: Optional[str] = None) -> bool:
8584
if model_type is None:
8685
return False
8786
model_info = MODEL_MAPPING[model_type]
8887
tags = model_info.get('tags') or []
8988
return 'multi-modal' in tags
9089

91-
@staticmethod
92-
def _is_vision(model_type: Optional[str] = None) -> bool:
90+
def _is_vision(self, model_type: Optional[str] = None) -> bool:
9391
if model_type is None:
9492
return False
9593
model_info = MODEL_MAPPING[model_type]
@@ -1590,6 +1588,12 @@ def handle_infer_backend(self) -> None:
15901588
if self.eval_url is None:
15911589
super().handle_infer_backend()
15921590

1591+
def _is_multimodal(self, model_type: Optional[str] = None) -> bool:
1592+
return False
1593+
1594+
def _is_vision(self, model_type: Optional[str] = None) -> bool:
1595+
return False
1596+
15931597

15941598
@dataclass
15951599
class ExportArguments(InferArguments):

swift/llm/utils/client_utils.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -100,7 +100,8 @@ def _from_base64(img_base64: Union[str, 'PIL.Image.Image'], tmp_dir: str = 'tmp'
100100
sha256_hash = hashlib.sha256(img_base64.encode('utf-8')).hexdigest()
101101
img_path = os.path.join(tmp_dir, f'{sha256_hash}.png')
102102
image = Image.open(BytesIO(base64.b64decode(img_base64)))
103-
image.save(img_path)
103+
if not os.path.exists(img_path):
104+
image.save(img_path)
104105
return img_path
105106

106107

0 commit comments

Comments
 (0)