-
Notifications
You must be signed in to change notification settings - Fork 6.4k
Docs: CogVideoX #9578
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Docs: CogVideoX #9578
Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice to have this! Redirecting to @stevhliu for a deeper review.
Instead of uploading the gif/png here, could you open a PR to https://huggingface.co/datasets/huggingface/documentation-images/tree/main/diffusers, which I will merge so we can link it here. We don't keep images/videos in this repository otherwise it can get quite bulky to clone
image move in https://huggingface.co/datasets/huggingface/documentation-images/discussions/371 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Super cool!! I did an initial pass over the docs and will follow up with a more in-depth look soon 🙂
| specific language governing permissions and limitations under the License. | ||
| --> | ||
| # CogVideoX | ||
| CogVideoX is an open-source version of the video generation model originating from QingYing. The table below displays the list of video generation models we currently offer, along with their foundational information. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to briefly describe the technical aspects of CogVideoX so users have a better idea of how it works and what makes it different from other models (check out the Stable Diffusion XL doc as an example).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe something like (feel free to copy/reuse in the training doc as well):
CogVideoX is a text-to-video generation model focused on creating more coherent videos aligned with a prompt. It achieves this using several methods.
-
a 3D variational autoencoder that compresses videos spatially and temporally, improving compression rate and video accuracy.
-
an expert transformer block to help align text and video, and a 3D full attention module for capturing and creating spatially and temporally accurate videos.
| > [!TIP] | ||
| > You can pass `--use_8bit_adam` to reduce the memory requirements of training. | ||
| > [!IMPORTANT] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should also just be plain text rather than a callout.
Co-authored-by: Steven Liu <[email protected]>
Co-authored-by: Steven Liu <[email protected]>
Co-authored-by: Steven Liu <[email protected]>
Co-authored-by: Steven Liu <[email protected]>
Co-authored-by: Steven Liu <[email protected]>
Co-authored-by: Steven Liu <[email protected]>
Co-authored-by: Steven Liu <[email protected]>
Co-authored-by: Steven Liu <[email protected]>
Co-authored-by: Steven Liu <[email protected]>
Co-authored-by: Steven Liu <[email protected]>
Co-authored-by: Steven Liu <[email protected]>
Co-authored-by: Steven Liu <[email protected]>
Co-authored-by: Steven Liu <[email protected]>
Co-authored-by: Steven Liu <[email protected]>
Co-authored-by: Steven Liu <[email protected]>
Co-authored-by: Steven Liu <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, thanks so much for iterating! Just a few more comments and then we can merge 🙂
| specific language governing permissions and limitations under the License. | ||
| --> | ||
| # CogVideoX | ||
| CogVideoX is an open-source version of the video generation model originating from QingYing. The table below displays the list of video generation models we currently offer, along with their foundational information. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe something like (feel free to copy/reuse in the training doc as well):
CogVideoX is a text-to-video generation model focused on creating more coherent videos aligned with a prompt. It achieves this using several methods.
-
a 3D variational autoencoder that compresses videos spatially and temporally, improving compression rate and video accuracy.
-
an expert transformer block to help align text and video, and a 3D full attention module for capturing and creating spatially and temporally accurate videos.
docs/source/en/training/cogvideox.md
Outdated
| --> | ||
| # CogVideoX | ||
|
|
||
| 🤗 Diffusers framework is huggface's open source solution related to diffusion model. Through module tools, it can be conveniently and quickly integrated with custom frameworks. In the direction of model training, Diffusers has accelerate acceleration support and is compatible with common reasoning frameworks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replace this paragraph with the suggestion (or something like that) from using-diffusers/cogvideox.md since users coming to Diffusers are probably already familiar with it. They want to know more about CogVideoX :)
Co-authored-by: Steven Liu <[email protected]>
Co-authored-by: Steven Liu <[email protected]>
Co-authored-by: Steven Liu <[email protected]>
Co-authored-by: Steven Liu <[email protected]>
|
@stevhliu is this good to merge now? |
|
Yeah looks good now. Thanks for iterating and improving on the docs @glide-the! 🤗 |
* CogVideoX docs --------- Co-authored-by: Steven Liu <[email protected]> Co-authored-by: YiYi Xu <[email protected]>
What does this PR do?
Added CogVideox's Advanced inference and model introduction
@sayakpaul