Skip to content

Commit e2987ff

Browse files
Merge pull request #474 from THUDM/CogVideoX_dev
Fix #472 #473
2 parents 74baf5c + a8205b5 commit e2987ff

File tree

11 files changed

+128
-191
lines changed

11 files changed

+128
-191
lines changed

README.md

Lines changed: 23 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -171,88 +171,87 @@ models we currently offer, along with their foundational information.
171171
<table style="border-collapse: collapse; width: 100%;">
172172
<tr>
173173
<th style="text-align: center;">Model Name</th>
174+
<th style="text-align: center;">CogVideoX1.5-5B (Latest)</th>
175+
<th style="text-align: center;">CogVideoX1.5-5B-I2V (Latest)</th>
174176
<th style="text-align: center;">CogVideoX-2B</th>
175177
<th style="text-align: center;">CogVideoX-5B</th>
176178
<th style="text-align: center;">CogVideoX-5B-I2V</th>
177-
<th style="text-align: center;">CogVideoX1.5-5B</th>
178-
<th style="text-align: center;">CogVideoX1.5-5B-I2V</th>
179179
</tr>
180180
<tr>
181181
<td style="text-align: center;">Release Date</td>
182+
<th style="text-align: center;">November 8, 2024</th>
183+
<th style="text-align: center;">November 8, 2024</th>
182184
<th style="text-align: center;">August 6, 2024</th>
183185
<th style="text-align: center;">August 27, 2024</th>
184186
<th style="text-align: center;">September 19, 2024</th>
185-
<th style="text-align: center;">November 8, 2024</th>
186-
<th style="text-align: center;">November 8, 2024</th>
187187
</tr>
188188
<tr>
189189
<td style="text-align: center;">Video Resolution</td>
190-
<td colspan="3" style="text-align: center;">720 * 480</td>
191190
<td colspan="1" style="text-align: center;">1360 * 768</td>
192-
<td colspan="1" style="text-align: center;">256 <= W <=1360<br>256 <= H <=768<br> W,H % 16 == 0</td>
191+
<td colspan="1" style="text-align: center;">256 <= W <=1360<br> 256 <= H <=768<br> W,H % 16 == 0</td>
192+
<td colspan="3" style="text-align: center;">720 * 480</td>
193193
</tr>
194194
<tr>
195195
<td style="text-align: center;">Inference Precision</td>
196-
<td style="text-align: center;"><b>FP16*(recommended)</b>, BF16, FP32, FP8*, INT8, not supported: INT4</td>
197-
<td colspan="2" style="text-align: center;"><b>BF16(recommended)</b>, FP16, FP32, FP8*, INT8, not supported: INT4</td>
198196
<td colspan="2" style="text-align: center;"><b>BF16</b></td>
197+
<td style="text-align: center;"><b>FP16*(Recommended)</b>, BF16, FP32, FP8*, INT8, Not supported: INT4</td>
198+
<td colspan="2" style="text-align: center;"><b>BF16 (Recommended)</b>, FP16, FP32, FP8*, INT8, Not supported: INT4</td>
199199
</tr>
200200
<tr>
201-
<td style="text-align: center;">Single GPU Memory Usage</td>
202-
<td style="text-align: center;"><a href="https://github.com/THUDM/SwissArmyTransformer">SAT</a> FP16: 18GB<br><b>diffusers FP16: from 4GB*</b><br><b>diffusers INT8(torchao): from 3.6GB*</b></td>
203-
<td colspan="2" style="text-align: center;"><a href="https://github.com/THUDM/SwissArmyTransformer">SAT</a> BF16: 26GB<br><b>diffusers BF16 : from 5GB*</b><br><b>diffusers INT8(torchao): from 4.4GB*</b></td>
204-
<td colspan="2" style="text-align: center;"><a href="https://github.com/THUDM/SwissArmyTransformer">SAT</a> BF16: 66GB<br></td>
201+
<td style="text-align: center;">Single GPU Memory Usage<br></td>
202+
<td colspan="2" style="text-align: center;"><a href="https://github.com/THUDM/SwissArmyTransformer">SAT</a> BF16: 66GB <br></td>
203+
<td style="text-align: center;"><a href="https://github.com/THUDM/SwissArmyTransformer">SAT</a> FP16: 18GB <br><b>diffusers FP16: 4GB minimum* </b><br><b>diffusers INT8 (torchao): 3.6GB minimum*</b></td>
204+
<td colspan="2" style="text-align: center;"><a href="https://github.com/THUDM/SwissArmyTransformer">SAT</a> BF16: 26GB <br><b>diffusers BF16 : 5GB minimum* </b><br><b>diffusers INT8 (torchao): 4.4GB minimum* </b></td>
205205
</tr>
206206
<tr>
207207
<td style="text-align: center;">Multi-GPU Memory Usage</td>
208+
<td colspan="2" style="text-align: center;"><b>Not Supported</b><br></td>
208209
<td style="text-align: center;"><b>FP16: 10GB* using diffusers</b><br></td>
209210
<td colspan="2" style="text-align: center;"><b>BF16: 15GB* using diffusers</b><br></td>
210-
<td colspan="2" style="text-align: center;"><b>Not supported</b><br></td>
211211
</tr>
212212
<tr>
213213
<td style="text-align: center;">Inference Speed<br>(Step = 50, FP/BF16)</td>
214+
<td colspan="2" style="text-align: center;">Single A100: ~1000 seconds (5-second video)<br>Single H100: ~550 seconds (5-second video)</td>
214215
<td style="text-align: center;">Single A100: ~90 seconds<br>Single H100: ~45 seconds</td>
215216
<td colspan="2" style="text-align: center;">Single A100: ~180 seconds<br>Single H100: ~90 seconds</td>
216-
<td colspan="2" style="text-align: center;">Single A100: ~1000 seconds (5-second video)<br>Single H100: ~550 seconds (5-second video)</td>
217217
</tr>
218218
<tr>
219219
<td style="text-align: center;">Prompt Language</td>
220220
<td colspan="5" style="text-align: center;">English*</td>
221221
</tr>
222222
<tr>
223223
<td style="text-align: center;">Prompt Token Limit</td>
224-
<td colspan="3" style="text-align: center;">226 Tokens</td>
225224
<td colspan="2" style="text-align: center;">224 Tokens</td>
225+
<td colspan="3" style="text-align: center;">226 Tokens</td>
226226
</tr>
227227
<tr>
228228
<td style="text-align: center;">Video Length</td>
229+
<td colspan="2" style="text-align: center;">5 seconds or 10 seconds</td>
229230
<td colspan="3" style="text-align: center;">6 seconds</td>
230-
<td colspan="2" style="text-align: center;">5 or 10 seconds</td>
231231
</tr>
232232
<tr>
233233
<td style="text-align: center;">Frame Rate</td>
234-
<td colspan="3" style="text-align: center;">8 frames / second</td>
235-
<td colspan="2" style="text-align: center;">16 frames / second</td>
234+
<td colspan="2" style="text-align: center;">16 frames / second </td>
235+
<td colspan="3" style="text-align: center;">8 frames / second </td>
236236
</tr>
237237
<tr>
238-
<td style="text-align: center;">Positional Encoding</td>
239-
<td style="text-align: center;">3d_sincos_pos_embed</td>
238+
<td style="text-align: center;">Position Encoding</td>
239+
<td colspan="2" style="text-align: center;">3d_rope_pos_embed</td>
240+
<td style="text-align: center;">3d_sincos_pos_embed</td>
240241
<td style="text-align: center;">3d_rope_pos_embed</td>
241242
<td style="text-align: center;">3d_rope_pos_embed + learnable_pos_embed</td>
242-
<td style="text-align: center;">3d_rope_pos_embed</td>
243-
<td style="text-align: center;">3d_rope_pos_embed</td>
244243
</tr>
245244
<tr>
246245
<td style="text-align: center;">Download Link (Diffusers)</td>
246+
<td colspan="2" style="text-align: center;"> Coming Soon </td>
247247
<td style="text-align: center;"><a href="https://huggingface.co/THUDM/CogVideoX-2b">🤗 HuggingFace</a><br><a href="https://modelscope.cn/models/ZhipuAI/CogVideoX-2b">🤖 ModelScope</a><br><a href="https://wisemodel.cn/models/ZhipuAI/CogVideoX-2b">🟣 WiseModel</a></td>
248248
<td style="text-align: center;"><a href="https://huggingface.co/THUDM/CogVideoX-5b">🤗 HuggingFace</a><br><a href="https://modelscope.cn/models/ZhipuAI/CogVideoX-5b">🤖 ModelScope</a><br><a href="https://wisemodel.cn/models/ZhipuAI/CogVideoX-5b">🟣 WiseModel</a></td>
249249
<td style="text-align: center;"><a href="https://huggingface.co/THUDM/CogVideoX-5b-I2V">🤗 HuggingFace</a><br><a href="https://modelscope.cn/models/ZhipuAI/CogVideoX-5b-I2V">🤖 ModelScope</a><br><a href="https://wisemodel.cn/models/ZhipuAI/CogVideoX-5b-I2V">🟣 WiseModel</a></td>
250-
<td colspan="2" style="text-align: center;"> Coming Soon </td>
251250
</tr>
252251
<tr>
253252
<td style="text-align: center;">Download Link (SAT)</td>
254-
<td colspan="3" style="text-align: center;"><a href="./sat/README_zh.md">SAT</a></td>
255253
<td colspan="2" style="text-align: center;"><a href="https://huggingface.co/THUDM/CogVideoX1.5-5b-SAT">🤗 HuggingFace</a><br><a href="https://modelscope.cn/models/ZhipuAI/CogVideoX1.5-5b-SAT">🤖 ModelScope</a><br><a href="https://wisemodel.cn/models/ZhipuAI/CogVideoX1.5-5b-SAT">🟣 WiseModel</a></td>
254+
<td colspan="3" style="text-align: center;"><a href="./sat/README_zh.md">SAT</a></td>
256255
</tr>
257256
</table>
258257

README_ja.md

Lines changed: 31 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -163,88 +163,87 @@ CogVideoXは、[清影](https://chatglm.cn/video?fr=osm_cogvideox) と同源の
163163
<table style="border-collapse: collapse; width: 100%;">
164164
<tr>
165165
<th style="text-align: center;">モデル名</th>
166+
<th style="text-align: center;">CogVideoX1.5-5B (最新)</th>
167+
<th style="text-align: center;">CogVideoX1.5-5B-I2V (最新)</th>
166168
<th style="text-align: center;">CogVideoX-2B</th>
167169
<th style="text-align: center;">CogVideoX-5B</th>
168170
<th style="text-align: center;">CogVideoX-5B-I2V</th>
169-
<th style="text-align: center;">CogVideoX1.5-5B</th>
170-
<th style="text-align: center;">CogVideoX1.5-5B-I2V</th>
171171
</tr>
172172
<tr>
173-
<td style="text-align: center;">リリース日</td>
173+
<td style="text-align: center;">公開日</td>
174+
<th style="text-align: center;">2024年11月8日</th>
175+
<th style="text-align: center;">2024年11月8日</th>
174176
<th style="text-align: center;">2024年8月6日</th>
175177
<th style="text-align: center;">2024年8月27日</th>
176178
<th style="text-align: center;">2024年9月19日</th>
177-
<th style="text-align: center;">2024年11月8日</th>
178-
<th style="text-align: center;">2024年11月8日</th>
179179
</tr>
180180
<tr>
181181
<td style="text-align: center;">ビデオ解像度</td>
182-
<td colspan="3" style="text-align: center;">720 * 480</td>
183182
<td colspan="1" style="text-align: center;">1360 * 768</td>
184-
<td colspan="1" style="text-align: center;">256 <= W <=1360<br>256 <= H <=768<br> W,H % 16 == 0</td>
183+
<td colspan="1" style="text-align: center;">256 <= W <=1360<br> 256 <= H <=768<br> W,H % 16 == 0</td>
184+
<td colspan="3" style="text-align: center;">720 * 480</td>
185185
</tr>
186186
<tr>
187187
<td style="text-align: center;">推論精度</td>
188-
<td style="text-align: center;"><b>FP16*(推奨)</b>, BF16, FP32, FP8*, INT8, INT4は非対応</td>
189-
<td colspan="2" style="text-align: center;"><b>BF16(推奨)</b>, FP16, FP32, FP8*, INT8, INT4は非対応</td>
190188
<td colspan="2" style="text-align: center;"><b>BF16</b></td>
189+
<td style="text-align: center;"><b>FP16*(推奨)</b>, BF16, FP32,FP8*,INT8,INT4非対応</td>
190+
<td colspan="2" style="text-align: center;"><b>BF16(推奨)</b>, FP16, FP32,FP8*,INT8,INT4非対応</td>
191191
</tr>
192192
<tr>
193-
<td style="text-align: center;">シングルGPUメモリ消費</td>
194-
<td style="text-align: center;"><a href="https://github.com/THUDM/SwissArmyTransformer">SAT</a> FP16: 18GB<br><b>diffusers FP16: 4GBから*</b><br><b>diffusers INT8(torchao): 3.6GBから*</b></td>
195-
<td colspan="2" style="text-align: center;"><a href="https://github.com/THUDM/SwissArmyTransformer">SAT</a> BF16: 26GB<br><b>diffusers BF16: 5GBから*</b><br><b>diffusers INT8(torchao): 4.4GBから*</b></td>
196-
<td colspan="2" style="text-align: center;"><a href="https://github.com/THUDM/SwissArmyTransformer">SAT</a> BF16: 66GB<br></td>
193+
<td style="text-align: center;">単一GPUメモリ消費量<br></td>
194+
<td colspan="2" style="text-align: center;"><a href="https://github.com/THUDM/SwissArmyTransformer">SAT</a> BF16: 66GB <br></td>
195+
<td style="text-align: center;"><a href="https://github.com/THUDM/SwissArmyTransformer">SAT</a> FP16: 18GB <br><b>diffusers FP16: 4GB以上* </b><br><b>diffusers INT8(torchao): 3.6GB以上*</b></td>
196+
<td colspan="2" style="text-align: center;"><a href="https://github.com/THUDM/SwissArmyTransformer">SAT</a> BF16: 26GB <br><b>diffusers BF16 : 5GB以上* </b><br><b>diffusers INT8(torchao): 4.4GB以上* </b></td>
197197
</tr>
198198
<tr>
199-
<td style="text-align: center;">マルチGPUメモリ消費</td>
200-
<td style="text-align: center;"><b>FP16: 10GB* using diffusers</b><br></td>
201-
<td colspan="2" style="text-align: center;"><b>BF16: 15GB* using diffusers</b><br></td>
202-
<td colspan="2" style="text-align: center;"><b>サポートなし</b><br></td>
199+
<td style="text-align: center;">複数GPU推論メモリ消費量</td>
200+
<td colspan="2" style="text-align: center;"><b>非対応</b><br></td>
201+
<td style="text-align: center;"><b>FP16: 10GB* diffusers使用</b><br></td>
202+
<td colspan="2" style="text-align: center;"><b>BF16: 15GB* diffusers使用</b><br></td>
203203
</tr>
204204
<tr>
205-
<td style="text-align: center;">推論速度<br>(ステップ数 = 50, FP/BF16)</td>
206-
<td style="text-align: center;">単一A100: 約90秒<br>単一H100: 約45秒</td>
207-
<td colspan="2" style="text-align: center;">単一A100: 約180秒<br>単一H100: 約90秒</td>
208-
<td colspan="2" style="text-align: center;">単一A100: 約1000秒(5秒動画)<br>単一H100: 約550秒(5秒動画)</td>
205+
<td style="text-align: center;">推論速度<br>(Step = 50, FP/BF16)</td>
206+
<td colspan="2" style="text-align: center;">シングルA100: ~1000秒(5秒ビデオ)<br>シングルH100: ~550秒(5秒ビデオ)</td>
207+
<td style="text-align: center;">シングルA100: ~90秒<br>シングルH100: ~45秒</td>
208+
<td colspan="2" style="text-align: center;">シングルA100: ~180秒<br>シングルH100: ~90秒</td>
209209
</tr>
210210
<tr>
211211
<td style="text-align: center;">プロンプト言語</td>
212212
<td colspan="5" style="text-align: center;">英語*</td>
213213
</tr>
214214
<tr>
215-
<td style="text-align: center;">プロンプトトークン制限</td>
216-
<td colspan="3" style="text-align: center;">226トークン</td>
215+
<td style="text-align: center;">プロンプト長さの上限</td>
217216
<td colspan="2" style="text-align: center;">224トークン</td>
217+
<td colspan="3" style="text-align: center;">226トークン</td>
218218
</tr>
219219
<tr>
220-
<td style="text-align: center;">ビデオの長さ</td>
221-
<td colspan="3" style="text-align: center;">6秒</td>
220+
<td style="text-align: center;">ビデオ長さ</td>
222221
<td colspan="2" style="text-align: center;">5秒または10秒</td>
222+
<td colspan="3" style="text-align: center;">6秒</td>
223223
</tr>
224224
<tr>
225225
<td style="text-align: center;">フレームレート</td>
226-
<td colspan="3" style="text-align: center;">8 フレーム / 秒</td>
227-
<td colspan="2" style="text-align: center;">16 フレーム / 秒</td>
226+
<td colspan="2" style="text-align: center;">16フレーム/秒</td>
227+
<td colspan="3" style="text-align: center;">8フレーム/秒</td>
228228
</tr>
229229
<tr>
230230
<td style="text-align: center;">位置エンコーディング</td>
231-
<td style="text-align: center;">3d_sincos_pos_embed</td>
231+
<td colspan="2" style="text-align: center;">3d_rope_pos_embed</td>
232+
<td style="text-align: center;">3d_sincos_pos_embed</td>
232233
<td style="text-align: center;">3d_rope_pos_embed</td>
233234
<td style="text-align: center;">3d_rope_pos_embed + learnable_pos_embed</td>
234-
<td style="text-align: center;">3d_rope_pos_embed</td>
235-
<td style="text-align: center;">3d_rope_pos_embed</td>
236235
</tr>
237236
<tr>
238237
<td style="text-align: center;">ダウンロードリンク (Diffusers)</td>
238+
<td colspan="2" style="text-align: center;"> 近日公開 </td>
239239
<td style="text-align: center;"><a href="https://huggingface.co/THUDM/CogVideoX-2b">🤗 HuggingFace</a><br><a href="https://modelscope.cn/models/ZhipuAI/CogVideoX-2b">🤖 ModelScope</a><br><a href="https://wisemodel.cn/models/ZhipuAI/CogVideoX-2b">🟣 WiseModel</a></td>
240240
<td style="text-align: center;"><a href="https://huggingface.co/THUDM/CogVideoX-5b">🤗 HuggingFace</a><br><a href="https://modelscope.cn/models/ZhipuAI/CogVideoX-5b">🤖 ModelScope</a><br><a href="https://wisemodel.cn/models/ZhipuAI/CogVideoX-5b">🟣 WiseModel</a></td>
241241
<td style="text-align: center;"><a href="https://huggingface.co/THUDM/CogVideoX-5b-I2V">🤗 HuggingFace</a><br><a href="https://modelscope.cn/models/ZhipuAI/CogVideoX-5b-I2V">🤖 ModelScope</a><br><a href="https://wisemodel.cn/models/ZhipuAI/CogVideoX-5b-I2V">🟣 WiseModel</a></td>
242-
<td colspan="2" style="text-align: center;">近日公開</td>
243242
</tr>
244243
<tr>
245244
<td style="text-align: center;">ダウンロードリンク (SAT)</td>
246-
<td colspan="3" style="text-align: center;"><a href="./sat/README_zh.md">SAT</a></td>
247245
<td colspan="2" style="text-align: center;"><a href="https://huggingface.co/THUDM/CogVideoX1.5-5b-SAT">🤗 HuggingFace</a><br><a href="https://modelscope.cn/models/ZhipuAI/CogVideoX1.5-5b-SAT">🤖 ModelScope</a><br><a href="https://wisemodel.cn/models/ZhipuAI/CogVideoX1.5-5b-SAT">🟣 WiseModel</a></td>
246+
<td colspan="3" style="text-align: center;"><a href="./sat/README_zh.md">SAT</a></td>
248247
</tr>
249248
</table>
250249

0 commit comments

Comments
 (0)