If I want to improve the performance on Chinese, is it necessary to add Chinese data for re-caption contrastive fine-tuning? Or should I add Chinese data and fine-tune with an LLM without freezing the text encoder? Or just add Chinese data and freeze the text encoder? Which method would be better? Have you conducted related experiments?