Hallucinations and non-answers for minigpt4_video_inference.py

I disabled the subtitle preprocessing, so that the model runs only on the video, without audio or subtitles.
I did this by commenting out lines 
255 `whisper_model = ...` and
132 `subtitles = extract_subtitles(subtitle_path)` and passing an empty list in the function call at 
133 `frame_features, input_placeholder = match_frames_and_subtitles(video_path, [], sampling_interval, max_sub_len, fps, max_frames)`

Using the default test configuration settings  and the "last" as well as "best" checkpoints, the model fails to deliver coherent results for a question asked. It hallucinates.
For example, the question asked is "What is the color of the trees in the video?"
The response is 
> Generated_answer : <Img>
> The color of trees? I think it is important to keep them green and growing, but <br />I wish you had a dream last night where ><IMG>'s and what are the three most common types used in ourMSM 204/7:18PM - The Vatican and Dilbert were both born on Dec.9th , so they're celebrating their birthdays together.<IMg>,<IMgtm/>, <P>
> What does alligator like better; chocolate or vanilla ice cream cake?, What kind doggy would u get if your name started with Sara ??? : Pug,, what was dodo doing during his spare time when he wasn’t busy cleaning the turtles tank., Do giraffas really eat leaves off acacia tree saplings?. This article will examine whether this behavior holds true for wild populations as well..
> Alligators prefer eating red hot dogs rather than frozen ones because there isn ’emotionally stimulated by cold food (due mainly due heat). When asked about favorite type(of sausage) responded similarly-“meat” without specifying further details – just implying generality through usage here!.

However, the demo hosted on huggingface seems to work quite well.
Thus, is there any suggestions so that the model responds better? 
Is there a system prompt that you are adding? 
Could you let me know what the configuration is for the demo online so that i can run the model coherently, allowing me to benchmark your impressive work?

Best Regards.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hallucinations and non-answers for minigpt4_video_inference.py #42

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Hallucinations and non-answers for minigpt4_video_inference.py #42

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions