partial source code of vision language model plus large language model contributor:m4deline,ziyu shi notice:our code is forked from Discrete-Continuous-VLN changes: 11.6:create repository