You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README_En.md
+1Lines changed: 1 addition & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -16,6 +16,7 @@ This is the Chinese version of CLIP. We use a large-scale Chinese image-text pai
16
16
<br><br>
17
17
18
18
# News
19
+
* 2023.5.9 Chinese-CLIP has been adapted to Pytorch2.0.
19
20
* 2023.3.20 Support [gradient accumulation](#gradient-accumulation) in contrastive learning to simulate the training effect of a larger batch size.
20
21
* 2023.2.16 Support [FlashAttention](https://github.com/HazyResearch/flash-attention) to improve training speed and reduce memory usage. See [flash_attention_En.md](flash_attention_En.md) for more information.
21
22
* 2023.1.15 Support the conversion of Pytorch models into [ONNX](https://onnx.ai/) or [TensorRT](https://developer.nvidia.com/tensorrt) formats (and provide pretrained TensorRT models) to improve inference speed and meet deployment requirements. See [deployment_En.md](deployment_En.md) for more information.
Copy file name to clipboardExpand all lines: flash_attention_En.md
+9-6Lines changed: 9 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,9 +6,12 @@ Chinese-CLIP now supports the acceleration of training process through [FlashAtt
6
6
7
7
## Environmental Preparation
8
8
9
-
+ Nvidia GPUs **with Volta or Ampere architecture** (such as A100, RTX 3090, T4, and RTX 2080). Please refer to [this document](https://en.wikipedia.org/wiki/CUDA#GPUs_supported) for the corresponding GPUs of each Nvidia architecture.
10
-
+ CUDA 11, NVCC
11
-
+**FlashAttention**:Install FlashAttention by executing `pip install flash-attn`. Please refer to the [FlashAttention project repository](https://github.com/HazyResearch/flash-attention).
9
+
+ Nvidia GPUs **with Turning, Ampere, Ada or Hopper architecture** (such as H100, A100, RTX 3090, T4, and RTX 2080). Please refer to [this document](https://en.wikipedia.org/wiki/CUDA#GPUs_supported) for the corresponding GPUs of each Nvidia architecture.
10
+
+ CUDA 11.4 and above.
11
+
+ PyTorch 1.12 and above.
12
+
+**FlashAttention**:Install FlashAttention by executing `pip install flash-attn`.
13
+
14
+
Please refer to the [FlashAttention project repository](https://github.com/HazyResearch/flash-attention) for more information.
12
15
13
16
## Use it in Chinese-CLIP!
14
17
@@ -17,7 +20,7 @@ Applying FlashAttention to the finetune process of Chinese-CLIP is very simple,
17
20
18
21
## Training Speed and Memory Usage Comparison
19
22
20
-
Enabling FlashAttention can significantly speed up the finetune process and reduce the memory usage of Chinese-CLIP without affecting the precision. Our experiments are conducted on an 8-card A100 GPU (80GB memory) machine.
23
+
Enabling FlashAttention can significantly speed up the finetune process and reduce the memory usage of Chinese-CLIP without affecting the precision. Our experiments are conducted on an 8-card A100 GPU (80GB memory) machine,FlashAttention 0.2.8,Pytorch 1.10.1.
21
24
22
25
We present the comparison of the batch time and memory usage of FP16 precision finetune for each scale model. The improvement in training speed and reduction in memory usage are more significant for larger models.
23
26
@@ -31,7 +34,7 @@ We present the comparison of the batch time and memory usage of FP16 precision f
0 commit comments