You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/source_en/Multi-Modal/internvl-best-practice.md
+4-3Lines changed: 4 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -17,17 +17,20 @@ The following practice takes `internvl-chat-v1_5` as an example, and you can als
17
17
18
18
**FAQ**
19
19
1.**Model shows `The request model does not exist!`**
20
+
20
21
This issue often arises when attempting to use the mini-internvl or InternVL2 models, as the corresponding models on modelscope are subject to an application process. To resolve this, you need to log in to modelscope and go to the respective model page to apply for download. After approval, you can obtain the model through either of the following methods:
21
22
- Use `snap_download` to download the model locally (the relevant code is available in the model download section of the model file), and then specify the local model file path using `--model_id_or_path`.
22
23
- Obtain the SDK token for your account from the [modelscope account homepage](https://www.modelscope.cn/my/myaccesstoken), and specify it using the `--hub_token` parameter or the `MODELSCOPE_API_TOKEN` environment variable.
23
24
24
25
2.**Why is the distribution uneven across multiple GPU cards when running models, leading to OOM?**
26
+
25
27
The auto device map algorithm in transformers is not friendly to multi-modal models, which may result in uneven memory allocation across different GPU cards.
26
28
27
29
- You can set the memory usage for each card using the `--device_max_memory parameter`, for example, in a four-card environment, you can set `--device_map_memory 15GB 15GB 15GB 15GB`.
28
30
- Alternatively, you can explicitly specify the device map using `--device_map_config_path`.
29
31
30
32
3.**Differences between the InternVL2 model and its predecessors (InternVL-V1.5 and Mini-InternVL)**
33
+
31
34
- The InternVL2 model supports multi-turn multi-image inference and training, meaning multi-turn conversations with images, and supports text and images interleaved within a single turn. For details, refer to [Custom Dataset](#custom-dataset) and InternVL2 part in Inference section. The predecessors models supported multi-turn conversations but could only have images in a single turn.
32
35
- The InternVL2 model supports video input. For specific formats, refer to [Custom Dataset](#custom-dataset).
33
36
@@ -53,7 +56,6 @@ pip install Pillow
53
56
- If your GPU does not support flash attention, use the argument --use_flash_attn false. And for int8 models, it is necessary to specify `dtype --bf16` during inference, otherwise the output may be garbled.
54
57
- The model's configuration specifies a relatively small max_length of 2048, which can be modified by setting `--max_length`.
55
58
- Memory consumption can be reduced by using the parameter `--gradient_checkpointing true`.
56
-
- The InternVL series of models only support training on datasets that include images.
57
59
58
60
```shell
59
61
# Experimental environment: A100
@@ -310,13 +312,12 @@ Supports multi-turn conversations, Images support for local path or URL input, m
The **InternVL2** model supports multi-image multi-turn training. It uses the tag `<image>` to indicate the position of images in the conversation. If the tag `<image>` is not present in the dataset, the images are placed at the beginning of the last round's query by default.
315
+
In addition to the above data formats, the **InternVL2** model also supports multi-image multi-turn training. It uses the tag `<image>` to indicate the position of images in the conversation. If the tag `<image>` is not present in the dataset, the images are placed at the beginning of the last round's query by default.
314
316
```jsonl
315
317
{"query": "Image-1: <image>\nImage-2: <image>\nDescribe the two images in detail.", "response": "xxxxxxxxx", "history": [["<image> Describe the image", "xxxxxxx"], ["CCCCC", "DDDDD"]], "images": ["image_path1", "image_path2", "image_path3"]}
316
318
```
317
319
Alternatively, use `<img>image_path</img>` to represent the image path and image location.
318
320
319
-
""
320
321
```jsonl
321
322
{"query": "Image-1: <img>img_path</img>\n Image-2: <img>img_path2</img>\n Describe the two images in detail.", "response": "xxxxxxxxx", "history": [["<img>img_path3</img> Describe the image", "xxxxxxx"], ["CCCCC", "DDDDD"]], }
0 commit comments