-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Description
When i input an image , the inference output is repetitive and not well-organized.
`conversation = [
{
"role": "<|User|>",
"content": "\n Describe the image.",
"images": ["../data/cat.png"],
},
{"role": "<|Assistant|>", "content": ""},
]
pil_images = load_pil_images(conversation)
prepare_inputs = vl_chat_processor(conversations=conversation, images=pil_images, force_batchify=True, system_prompt="Describe the image.").to(vl_gpt.device)
inputs_embeds = vl_gpt.prepare_inputs_embeds(**prepare_inputs)
outputs = vl_gpt.language.generate(
inputs_embeds=inputs_embeds,
attention_mask=prepare_inputs.attention_mask,
pad_token_id=tokenizer.eos_token_id,
bos_token_id=tokenizer.bos_token_id,
eos_token_id=tokenizer.eos_token_id,
max_new_tokens=128,
do_sample=False,
use_cache=True
)
answer = tokenizer.decode(outputs[0].cpu().tolist(), skip_special_tokens=True)
print(f"{prepare_inputs['sft_format'][0]}", answer)`
The output is follow:
Describe the image.
<|Assistant|>: The image shows a cat wearing a shirt with a shirt with a design that resembles a shirt with a design that resembles a shirt with a design that resembles a shirt with with with with with with with with with with with with with with with with with with with with with with with with with with with with with with with with with with with with with with with with with with with with with with with with with with with with with with with with with with with with with with with with with with with with with with with with with with with with with with with with with with with with with with with with with with with with with with with with