Skip to content

Commit ca00edd

Browse files
LM Eval tests -- ignore vision tower for VL fp8 test (#1562)
SUMMARY: The current lm-eval test for vision language models with fp8_dynamic scheme include the vision tower component of the model in the compression. As discussed with @anmarques and @eldarkurtic, this is generally not a good idea and we want to err on the side of accuracy over improved runtime when the tradeoff exists. Excluding vision tower from compression slightly decreases accuracy (from 0.866 to 0.833) in this case, but generally speaking it will degrade performance and we don't want to encourage users to do so. This PR updates the test to explicitly ignore the vision tower components in the test TEST PLAN: Ran test a few times locally, reproduced 0.8333 value each time. Confirmed the model size is now slightly larger with the vision tower excluded from compression --------- Signed-off-by: Brian Dellabetta <[email protected]>
1 parent a18c76a commit ca00edd

File tree

2 files changed

+21
-1
lines changed

2 files changed

+21
-1
lines changed
Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
quant_stage:
2+
quant_modifiers:
3+
QuantizationModifier:
4+
ignore: ["lm_head", "re:vision_tower.*", "re:multi_modal_projector.*", "re:visual.*", "re:vision_model.*", "re:model.visual.*"]
5+
config_groups:
6+
group_0:
7+
weights:
8+
num_bits: 8
9+
type: "float"
10+
symmetric: true
11+
strategy: "channel"
12+
dynamic: false
13+
input_activations:
14+
num_bits: 8
15+
type: "float"
16+
symmetric: true
17+
strategy: "token"
18+
dynamic: true
19+
targets: ["Linear"]

tests/lmeval/configs/vl_fp8_dynamic_per_token.yaml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@ cadence: weekly
22
model: Qwen/Qwen2.5-VL-7B-Instruct
33
model_class: Qwen2_5_VLForConditionalGeneration
44
scheme: FP8_DYNAMIC
5+
recipe: tests/e2e/vLLM/recipes/FP8/recipe_fp8_dynamic.yaml
56
lmeval:
67
model: "hf-multimodal"
78
model_args:
@@ -13,5 +14,5 @@ lmeval:
1314
batch_size: 8
1415
# dense model achieves accuracy of 0.9 +/ 0.0557
1516
metrics:
16-
acc,none: 0.8667
17+
acc,none: 0.8333
1718
acc_stderr,none: 0.0557

0 commit comments

Comments
 (0)