-
Notifications
You must be signed in to change notification settings - Fork 121
Open
Description
Hi, Thx for the great job, but there is a discrepancy that doesn't match the content in the essay. In the essay, It says "We adopt BERT [26] as the text encoder and its parameters are trained in the first and second training stages while being frozen in the last training stage.", but the repo shows that we should download the pretrained vit model. So I am a little confused if I should use the origin vit model or the finetuned one? and where is the finetuned text encoder model?
Metadata
Metadata
Assignees
Labels
No labels