Skip to content

Commit 4c2150b

Browse files
authored
Merge pull request #234 from stanford-crfm/jonathanxu81205/llava-critic-1
@jonathanxu81205 - add llava-critic-1
2 parents 93ef5cc + 32b4fdb commit 4c2150b

File tree

1 file changed

+22
-0
lines changed

1 file changed

+22
-0
lines changed

assets/bytedance.yaml

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -49,3 +49,25 @@
4949
prohibited_uses: unknown
5050
monitoring: unknown
5151
feedback: https://huggingface.co/ByteDance/SDXL-Lightning/discussions
52+
- type: model
53+
name: LLaVA-Critic
54+
organization: ByteDance and University of Maryland, College Park
55+
description: LLaVA-Critic is an open-source large multimodal model (LMM) designed as a generalist evaluator. It assesses performance across a variety of multimodal tasks by following a high-quality critic instruction dataset, incorporating diverse evaluation criteria. The model is effective in areas like LMM-as-a-Judge, providing reliable evaluation scores comparable to GPT models, and Preference Learning, offering reward signals for preference learning to enhance model alignment capabilities.
56+
created_date: 2024-10-06
57+
url: https://arxiv.org/pdf/2410.02712
58+
model_card: unknown
59+
modality: image, text; text
60+
analysis: LLaVA-Critic was tested in scenarios such as LMM-as-a-Judge and Preference Learning, showing a high correlation with commercial GPT models in evaluation scores. It served as an alternative to expensive human feedback in resource-constrained settings and demonstrated better performance in providing AI-generated feedback for model alignment compared to human-reliant reward models.
61+
size: unknown
62+
dependencies: []
63+
training_emissions: unknown
64+
training_time: unknown
65+
training_hardware: unknown
66+
quality_control: The model ensures quality by utilizing a high-quality dataset for critic instructions, providing both quantitative judgments and reasoning, with transparency in assessments.
67+
access: open
68+
license: Apache 2.0
69+
intended_uses: The model can be used for evaluating multimodal tasks, generating reward signals for preference learning, and serving as a reliable alternate judge for model assessments.
70+
prohibited_uses: The model should not be used in scenarios requiring authorization from proprietary models, nor relied upon for critical applications without human oversight due to potential biases in dataset.
71+
monitoring: unknown
72+
feedback: unknown
73+

0 commit comments

Comments
 (0)