Replies: 1 comment 3 replies
-
I think that's worthy of implementation. Instead of text tokens, we can feed GPT2 embeddings from the last layer. It might be even smarter with text than original DALL-E, because GPT2 was trained on a large amount of text. DALL-E has to rediscover everything about language from just the captions, which are small and fragmented. |
Beta Was this translation helpful? Give feedback.
3 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
GPT-2 Extra Large model (1775M Parameters) + DALL-E PyTorch implantation.
(Or a Fine-Tuned 355M Model)
Do you think that would be a feasible idea?
It will be a dumbed down version of DALL-E for sure, but it will open up much more possibilities.
Beta Was this translation helpful? Give feedback.
All reactions