The model you provide in modelscope and huggingface is not complete

The model you provide in modelscope and huggingface only include the LLM model. The cross attention part and visual part is missing. Ergo, based on the ckpt, we cannot re-implement your exps. Hope you can make it complete. Plus, if it is what it expected to be on the png image, it should be a greate work.