You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
- Implement LMDB database integration for efficient loading of large multimodal datasets
- Add caching mechanism for LMDB environments and transactions to improve performance
- Update documentation with LMDB usage examples for both Chinese and English
- Update type hints in vision_utils.py to reflect new functionality
- Add graceful handling for environments without LMDB installed
Copy file name to clipboardExpand all lines: docs/source_en/Customization/Custom-dataset.md
+6-1Lines changed: 6 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -143,22 +143,27 @@ Please refer to [Reranker training document](../BestPractices/Reranker.md#datase
143
143
144
144
### Multimodal
145
145
146
-
For multimodal datasets, the format is the same as the aforementioned tasks. The difference lies in the addition of several keys: `images`, `videos`, and `audios`, which represent the URLs or paths (preferably absolute paths) of multimodal resources. The tags `<image>`, `<video>`, and `<audio>` indicate where to insert images, videos, or audio. MS-Swift supports multiple images, videos, and audio files. These special tokens will be replaced during preprocessing, as referenced [here](https://github.com/modelscope/ms-swift/blob/main/swift/llm/template/template/qwen.py#L198). The four examples below respectively demonstrate the data format for plain text, as well as formats containing image, video, and audio data.
146
+
For multimodal datasets, the format is the same as the aforementioned tasks. The difference lies in the addition of several keys: `images`, `videos`, and `audios`, which represent the URLs or paths (preferably absolute paths) of multimodal resources. The tags `<image>`, `<video>`, and `<audio>` indicate where to insert images, videos, or audio. MS-Swift supports multiple images, videos, and audio files. These special tokens will be replaced during preprocessing, as referenced [here](https://github.com/modelscope/ms-swift/blob/main/swift/llm/template/template/qwen.py#L198). The examples below demonstrate the data format for plain text, as well as formats containing image, video, and audio data.
147
147
148
+
SWIFT supports loading multimodal resources from LMDB databases using the format `lmdb://key@path_to_lmdb`. This is highly effective for storing and accessing large collections of images, videos, audio files, and other resources, especially when training and inferencing with large-scale multimodal datasets. Make sure to install LMDB first: `pip install lmdb`.
148
149
149
150
Pre-training:
150
151
```jsonl
151
152
{"messages": [{"role": "assistant", "content": "Pre-trained text goes here"}]}
152
153
{"messages": [{"role": "assistant", "content": "<image>is a puppy, <image>is a kitten"}], "images": ["/xxx/x.jpg", "/xxx/x.png"]}
154
+
{"messages": [{"role": "assistant", "content": "<image>is a rabbit loaded from LMDB"}], "images": ["lmdb://rabbit_img@/path/to/animals_lmdb"]}
153
155
{"messages": [{"role": "assistant", "content": "<audio>describes how nice the weather is today"}], "audios": ["/xxx/x.wav"]}
154
156
{"messages": [{"role": "assistant", "content": "<image>is an elephant, <video>is a lion running"}], "images": ["/xxx/x.jpg"], "videos": ["/xxx/x.mp4"]}
157
+
{"messages": [{"role": "assistant", "content": "<video>shows galaxies in space"}], "videos": ["lmdb://space_video@/path/to/videos_lmdb"]}
155
158
```
156
159
157
160
Supervised Fine-tuning:
158
161
159
162
```jsonl
160
163
{"messages": [{"role": "user", "content": "Where is the capital of Zhejiang?"}, {"role": "assistant", "content": "The capital of Zhejiang is Hangzhou."}]}
161
164
{"messages": [{"role": "user", "content": "<image><image>What is the difference between the two images?"}, {"role": "assistant", "content": "The first one is a kitten, and the second one is a puppy."}], "images": ["/xxx/x.jpg", "/xxx/x.png"]}
165
+
{"messages": [{"role": "user", "content": "<image>What is this animal?"}, {"role": "assistant", "content": "This is a brown panda, a very rare species."}], "images": ["lmdb://panda_img@/path/to/wildlife_lmdb"]}
166
+
{"messages": [{"role": "user", "content": "<image>and<image>What's the difference between these two animals?"}, {"role": "assistant", "content": "The first image is a tiger, and the second image is a lion."}], "images": ["lmdb://tiger_img@/path/to/animals_lmdb", "lmdb://lion_img@/path/to/animals_lmdb"]}
162
167
{"messages": [{"role": "user", "content": "<audio>What did the audio say?"}, {"role": "assistant", "content": "The weather is really nice today."}], "audios": ["/xxx/x.mp3"]}
163
168
{"messages": [{"role": "system", "content": "You are a helpful and harmless assistant."}, {"role": "user", "content": "<image>What is in the image, <video>What is in the video?"}, {"role": "assistant", "content": "The image shows an elephant, and the video shows a puppy running on the grass."}], "images": ["/xxx/x.jpg"], "videos": ["/xxx/x.mp4"]}
0 commit comments