You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Existing evaluations rely almost entirely on text- or label-based perturbations, which check only whether the predicted mask matches the queried label. Such evaluations overlook the spatial footprint and severity of hallucination and therefore fail to reveal vision-driven hallucinations, which are more challenging and more prevalent.
127
127
To address this gap, we formalize the task of <spanstyle="font-style: italic;">Counterfactual Segmentation Reasoning (CSR)</span>, where a model must segment the referenced object in the factual image and abstain in its counterfactual counterpart.
128
128
To support this task, we curate <spanclass="model-name-gradient">HalluSegBench</span>, the first large-scale benchmark to diagnose referring and reasoning expression segmentation hallucinations using controlled visual counterfactuals, alongside new evaluation metrics that measure hallucination severity and disentangle vision- and language-driven failure modes.
129
-
We further introduce <spanclass="model-name-gradient">RobustSeg</span>, a segmentation VLM trained with counterfactual fine-tuning (CFT) to learn when to segment and when to abstain. Experimental results confirm <spanclass="model-name-gradient">RobustSeg</span> reduces hallucinations by 30%, while improving segmentation performance on FP-RefCOCO(+/g).\\
129
+
We further introduce <spanclass="model-name-gradient">RobustSeg</span>, a segmentation VLM trained with counterfactual fine-tuning (CFT) to learn when to segment and when to abstain. Experimental results confirm <spanclass="model-name-gradient">RobustSeg</span> reduces hallucinations by 30%, while improving segmentation performance on FP-RefCOCO(+/g).
Comparison of Reasoning Segmentation Models on <spanclass="model-name">HalluSegBench</span> Metrics, including textual and visual IoU drop for referral and reasoning tasks (<code>ΔIoU Referral</code>, <code>ΔIoU Reasoning</code>),
172
172
factual and counterfactual Confusion Mask Score ( <code>CMS</code>).
Here, <i>c</i> = “front cow” and <i>c′</i> = “front pig”.
217
+
Here, <i>c</i> = “Where in the picture would be suitable for storing wine?” and <i>c′</i> = “Where in the picture would be suitablefor resting one's feet?”.
0 commit comments