Can I use VCD for POPE benchmark?

Hi,
I am currently working on applying the VCD mitigation method to the LLaVA model on the POPE benchmark. The questions in the POPE benchmark are all binary (Yes/No), for example:
``` json
{"question_id": 1, "image": "COCO_val2014_000000016631.jpg", "text": "Is there a person in the image?", "label": "yes"}
{"question_id": 2, "image": "COCO_val2014_000000016631.jpg", "text": "Is there a refrigerator in the image?", "label": "no"}
```
Is it possible to use VCD to further improve LLaVA's performance on these types of binary questions?
If so, could you provide guidance on how to implement it?

Below is the current implementation I am using for LLaVA to generate answers from an image:
``` python
model_id = "llava-hf/llava-1.5-7b-hf"
model = LlavaForConditionalGeneration.from_pretrained(
    model_id, 
    torch_dtype=torch.float16, 
    low_cpu_mem_usage=True, 
).to(0)

processor = AutoProcessor.from_pretrained(model_id)

conversation = [
    {

      "role": "user",
      "content": [
          {"type": "text", "text": question_text},
          {"type": "image"},
        ],
    },
]
prompt = processor.apply_chat_template(conversation, add_generation_prompt=True)

inputs = processor(images=raw_image, text=prompt, return_tensors='pt').to(0, torch.float16)

output = model.generate(**inputs, max_new_tokens=20, do_sample=False)
answer_text = processor.decode(output[0][2:], skip_special_tokens=True)
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can I use VCD for POPE benchmark? #31

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Can I use VCD for POPE benchmark? #31

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions