Thanks for the great work and complete code implementation. It seems that the authors have not including training text2video model code with three pipelines (subject customization, etc). Do you have plan to also release this part of code? Or how much computing resource (which, how much, how long gpu is used in training) to fine-tune text2video model in the proposed work?
Looking forward to your reply.