Improve U-Net and Transformer implementations

As the documentation states, the current implementations are neither the most effective or efficient. The U-Net implementation was adapted from the [The Annotated Diffusion Model](https://huggingface.co/blog/annotated-diffusion) and the Transformer implementation was adapted from [Peebles & Xie (2022)](https://arxiv.org/abs/2212.09748) (adaptive layer norm block). Although these produce good enough results, ideally the library would provide the best implementations out there for general use.

From what I've read, I think a good choice for the U-Net implementation would be the one used in [Imagen](https://arxiv.org/abs/2205.11487) for the Text-to-Image model, but there may well be other more recent architectures that would be a better fit. For the Transformer I'm really not sure right now. Any input on this would be greatly appreciated.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve U-Net and Transformer implementations #8

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Improve U-Net and Transformer implementations #8

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions