Skip to content

Hallucinations and non-answers for minigpt4_video_inference.py #42

@amansahu278

Description

@amansahu278

I disabled the subtitle preprocessing, so that the model runs only on the video, without audio or subtitles.
I did this by commenting out lines
255 whisper_model = ... and
132 subtitles = extract_subtitles(subtitle_path) and passing an empty list in the function call at
133 frame_features, input_placeholder = match_frames_and_subtitles(video_path, [], sampling_interval, max_sub_len, fps, max_frames)

Using the default test configuration settings and the "last" as well as "best" checkpoints, the model fails to deliver coherent results for a question asked. It hallucinates.
For example, the question asked is "What is the color of the trees in the video?"
The response is

Generated_answer :
The color of trees? I think it is important to keep them green and growing, but
I wish you had a dream last night where >'s and what are the three most common types used in ourMSM 204/7:18PM - The Vatican and Dilbert were both born on Dec.9th , so they're celebrating their birthdays together.,,


What does alligator like better; chocolate or vanilla ice cream cake?, What kind doggy would u get if your name started with Sara ??? : Pug,, what was dodo doing during his spare time when he wasn’t busy cleaning the turtles tank., Do giraffas really eat leaves off acacia tree saplings?. This article will examine whether this behavior holds true for wild populations as well..
Alligators prefer eating red hot dogs rather than frozen ones because there isn ’emotionally stimulated by cold food (due mainly due heat). When asked about favorite type(of sausage) responded similarly-“meat” without specifying further details – just implying generality through usage here!.

However, the demo hosted on huggingface seems to work quite well.
Thus, is there any suggestions so that the model responds better?
Is there a system prompt that you are adding?
Could you let me know what the configuration is for the demo online so that i can run the model coherently, allowing me to benchmark your impressive work?

Best Regards.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions