About the text encoder?

Hi, Thx for the great job, but there is a discrepancy that doesn't match the content in the essay. In the essay, It says "We adopt BERT [26] as the text encoder and its parameters are trained in the first and second training stages while being frozen in the last training stage.", but the repo shows that we should download the pretrained vit model. So I am a little confused if I should use the origin vit model or the   finetuned one? and where is the finetuned text encoder model?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

About the text encoder? #53

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

About the text encoder? #53

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions