Skip to content

Commit 0070d0a

Browse files
authored
Update LLaVA_OneVision_Chat.md
1 parent 333d6fc commit 0070d0a

File tree

1 file changed

+15
-5
lines changed

1 file changed

+15
-5
lines changed

docs/LLaVA_OneVision_Chat.md

Lines changed: 15 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -4,8 +4,8 @@
44

55
### Key Observations:
66

7-
- **Impact of Alignment Learning**: By incorporating alignment learning—whether through human feedback or AI-generated feedback—we've observed a notable improvement in LLaVA-OneVision's chat experience. This progress is reflected in the significant performance gains recorded on both the LLaVA-W and WildVision benchmarks.
8-
- **Success of Self-Generated Feedback**: In LLaVA-OneVision's case, leveraging self-generated feedback data has proven to be a highly effective strategy for enhancing its visual chat capabilities. This approach allows the model to refine its responses autonomously, leading to more natural and coherent conversations.
7+
- **Impact of Preference Learning**: By incorporating alignment learning—whether through human feedback or AI-generated feedback—we've observed a notable improvement in LLaVA-OneVision's chat experience. This progress is reflected in the significant performance gains recorded on both the LLaVA-W and WildVision benchmarks.
8+
- **Success of Self-Generated Feedback**: In LLaVA-OneVision's case, leveraging self-generated feedback data has proven to be a highly effective strategy for enhancing its visual chat capabilities. Specifically, [LLaVA-Critic](https://llava-vl.github.io/blog/2024-10-03-llava-critic/) is a utilized as a generalist evaluator to generate the scoring feedback for preference learning. This approach allows the model to refine its responses autonomously, leading to more natural and coherent conversations.
99

1010
----
1111

@@ -57,7 +57,7 @@ To optimize LLaVA-OneVision’s in-the-wild conversational abilities, we've empl
5757

5858
1. **Human Feedback from [LLaVA-RLHF](https://llava-rlhf.github.io/)**: Real-world human input plays a crucial role in guiding the model toward more intuitive and user-friendly responses.
5959

60-
2. **AI Feedback from LLaVA-OV’s Self-Generated Responses**: Additionally, the AI's own self-generated feedback allows it to continuously improve and adapt, making this a valuable source for iterative learning.
60+
2. **AI Feedback from LLaVA-OV’s Self-Generated Responses**: Additionally, the AI's own self-generated feedback allows it to continuously improve and adapt, making this a valuable source for iterative learning. [LLaVA-Critic](https://llava-vl.github.io/blog/2024-10-03-llava-critic/) is a utilized as a generalist evaluator to generate the scoring feedback for preference learning
6161

6262
By experimenting with either of these two forms of feedback, we've been able to significantly enhance LLaVA-OneVision's conversation capabilities, bringing it closer to achieving seamless visual chat interactions in dynamic, real-world environments.
6363

@@ -76,7 +76,7 @@ For each langauge-image prompt in the dataset, we randomly generate `k = 5` cand
7676

7777
##### Step 2: Scoring and Acquiring Feedback Data
7878

79-
Once the candidate responses are generated, we utilize a feedback source (e.g., the Reward Model from LLaVA-RLHF) to score each of them. The reward model is responsible for evaluating the quality of the responses based on relevance, coherence, and appropriateness in relation to the given image-question pair. From the scored responses, we then select:
79+
Once the candidate responses are generated, we utilize a feedback source (e.g., the reward signals from LLaVA-RLHF or reward signals from LLaVA-Critic) to score each of them. The reward model is responsible for evaluating the quality of the responses based on relevance, coherence, and appropriateness in relation to the given image-question pair. From the scored responses, we then select:
8080

8181
- The **best** response (highest score)
8282
- The **worst** response (lowest score)
@@ -111,7 +111,7 @@ This iterative process is repeated for `N=3` rounds in total, with each round re
111111

112112
------
113113

114-
Stay tuned on how we develop AI feedback for self-improvement LMMs!
114+
Check out on how we develop AI feedback for self-improvement LMMs, using [LLaVA-Critic](https://llava-vl.github.io/blog/2024-10-03-llava-critic/) as a generalist evaluator to generate the scoring feedback for preference learning!
115115

116116
*Contributors to LLaVA-OneVision-Chat: [Tianyi Xiong](https://tyxiong23.github.io/), [Bo Li](https://brianboli.com/), [Dong Guo](https://www.linkedin.com/in/dongguoset/), [Huizhuo Yuan](https://scholar.google.com/citations?user=8foZzX4AAAAJ), [Quanquan Gu](https://web.cs.ucla.edu/~qgu/), [Chunyuan Li](https://scholar.google.com/citations?user=Zd7WmXUAAAAJ)*
117117

@@ -129,6 +129,16 @@ If you find it useful for your research and applications, please cite related pa
129129
year={2024}
130130
}
131131
132+
@article{xiong2024llavacritic,
133+
title={LLaVA-Critic: Learning to Evaluate Multimodal Models},
134+
author={Xiong, Tianyi and Wang, Xiyao and Guo, Dong and Ye, Qinghao and Fan, Haoqi and Gu, Quanquan and Huang, Heng and Li, Chunyuan},
135+
year={2024},
136+
eprint={2410.02712},
137+
archivePrefix={arXiv},
138+
primaryClass={cs.CV},
139+
url={https://arxiv.org/abs/2410.02712},
140+
}
141+
132142
@article{li2024llavaov,
133143
title={Llava-onevision: Easy visual task transfer},
134144
author={Li, Bo and Zhang, Yuanhan and Guo, Dong and Zhang, Renrui and Li, Feng and Zhang, Hao and Zhang, Kaichen and Li, Yanwei and Liu, Ziwei and Li, Chunyuan},

0 commit comments

Comments
 (0)