Hi!
I was using the huggingface_api for inference on lmsys/vicuna-7b-v1.5 using the following command:
python3 -m fastchat.serve.huggingface_api --model lmsys/vicuna-7b-v1.5
The ASSISTANT's output looks like (with the special characters "▁" and additional spaces):
USER: Hello! Who are you?
ASSISTANT: ▁I ' m ▁a ▁language ▁model ▁called ▁Vic una , ▁and ▁I ▁was ▁trained ▁by ▁Lar ge ▁Model ▁Systems ▁Organ ization ▁( L MS YS ) ▁research ers .
However, I was expecting the output to be clean:
USER: Hello! Who are you?
ASSISTANT: I'm a language model called Vicuna , and I was trained by Large Model Systems Organization (LMSYS) researchers.
I need to have clean output because I am performing multi-turn generation (i.e. pass the first response of the assistant back to the assistant as context for generating next response).
Sorry if I am missing something fundamental here but any help would be much appreciated!
