[CVPR2026] CARE What Fails: Contrastive Anchored-REflection for Verifiable Multimodal Reasoning

CARE is a failure-centric post-training framework for verifiable multimodal reasoning. Instead of discarding wrong rollouts, CARE turns close-but-wrong attempts into structured supervision with an anchored-contrastive objective and Reflection-Guided Resampling (RGR), while keeping single-pass inference with no test-time reflection.

News

2026.2 Our paper is accepted by CVPR 2026! See you in Denver!

2026.1 Code is released.

Highlights

Failure-centric RLVR for multimodal reasoning: CARE explicitly learns from failed rollouts instead of treating them as discarded samples or uniform negatives.
Anchored-contrastive training: pick the shortest verified-correct rollout as the anchor, select semantically proximate hard negatives, normalize advantages within the subgroup, and down-weight only negative advantages.
Reflection-Guided Resampling (RGR): repair exactly one representative hard negative during training, re-verify it, and reuse the repaired rollout if it becomes correct.
All-negative rescue: inject a small pseudo-contrast when a rollout group has no successes, preventing zero-signal updates.
No extra test-time cost: CARE improves training-time credit assignment without requiring reflection or multi-pass decoding at inference.

Method overview

For a multimodal prompt x = <image(s), question>, CARE samples a group of rollouts and uses a programmatic verifier over the final answer and output format to build a failure-aware learning signal:

Sample a group of rollouts (size G).
Verify each rollout with a programmatic verifier.
If at least one rollout is correct:
- Choose the anchor as the shortest verified-correct rollout.
- Select hard negatives that are closest to the anchor in rationale space via cosine proximity.
- Normalize rewards only inside the selected subgroup and down-weight only the negative advantages.
- If fewer than the target number of hard negatives are available, rescale the update size to keep training stable.
During training, optionally run RGR on one representative hard negative:
- Insert a short repair cue.
- Resample once.
- Replace the original failure if the repaired rollout becomes verifier-positive; otherwise keep it with a reduced penalty.
If all rollouts are incorrect:
- Apply an all-negative rescue with a small pseudo-contrast so gradients do not stall.
Use a region-weighted policy objective:
- Answer tokens receive full weight.
- Positive rationale tokens receive a small weight.
- Failing rationale tokens are masked out.

CARE changes how training signals are formed from rollouts, but keeps the verifier and single-decode inference pipeline unchanged.

Benchmark snapshot

According to the paper, CARE delivers consistent gains over existing RLVR baselines:

On Qwen2.5-VL-7B, CARE improves macro-averaged accuracy by +4.6 points over GRPO across six verifiable visual-reasoning benchmarks.
On Qwen3-VL-8B, CARE reaches competitive or state-of-the-art results on MathVista mini and MMMU-Pro under the same evaluation protocol.
The paper attributes most of the gains to the anchored-contrastive objective, with RGR providing an additional improvement by converting near-miss failures into usable positives.

Repository map

CARE subgrouping + advantages: verl/algorithms/adv_estimators/care.py
Cosine hard negatives: verl/algorithms/neg_selectors/cosine_hardneg.py
Region-weighted token advantages: verl/algorithms/losses/region_weighted_tokens.py
RGR hook (training-only): verl/hooks/rgr.py
Tag parsing for /: verl/utils/response_tags.py
Config surface: examples/config.yaml (care.*, rgr.*)
Qwen2.5-VL CARE script: examples/qwen2_5_vl_7b_geo3k_care_grpo.sh

Installation

git clone https://github.com/yongxinwang-ai/CARE.git
cd CARE

# (recommended) create env
conda create -n care python=3.10 -y
conda activate care

pip install -e .

Quickstart

CARE (Qwen2.5-VL 7B, Geometry3K)

bash examples/qwen2_5_vl_7b_geo3k_care_grpo.sh

CARE with overrides

python3 -m verl.trainer.main \
  config=examples/config.yaml \
  data.train_files=hiyouga/geometry3k@train \
  data.val_files=hiyouga/geometry3k@test \
  worker.actor.model.model_path=Qwen/Qwen2.5-VL-7B-Instruct \
  algorithm.grpo_variant=care \
  care.K=4 care.M=6 \
  care.neg_scale_s=0.5 care.equalize=true \
  care.rescue.enable=true care.rescue.delta=0.1 \
  care.token_weighting=region_weighted care.gamma_pos=0.005 \
  rgr.enable=true rgr.template=structured

CARE configuration

CARE is exposed via algorithm.grpo_variant=care and the care.* / rgr.* sections in examples/config.yaml. The provided Geometry3K example script uses:

Rollouts per prompt: G = worker.rollout.n (default 8 in examples)
Hard-negative subgroup size: care.K = 4
Negative preselect size: care.M = 6
Negative scaling: care.neg_scale_s = 0.5
Reflected-failure scaling: rgr.s_refl = care.neg_scale_s / 2 (if not set)
All-negative rescue magnitude: care.rescue.delta = 0.1
Positive rationale token weight (region-weighting): care.gamma_pos = 0.005

Citation

If you use this code, please cite the paper. The arXiv entry is:

@article{wang2025care,
  title   = {CARE What Fails: Contrastive Anchored-REflection for Verifiable Multimodal Reasoning},
  author  = {Wang, Yongxin and Yang, Zhicheng and Cao, Meng and Han, Mingfei and Lin, Haokun and Zhu, Yingying and Chang, Xiaojun and Liang, Xiaodan},
  journal = {arXiv preprint arXiv:2512.19554},
  year    = {2025}
}

License

Apache-2.0. See LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 180 Commits
.github		.github
assets		assets
examples		examples
scripts		scripts
verl		verl
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
Dockerfile		Dockerfile
Dockerfile.legacy		Dockerfile.legacy
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

[CVPR2026] CARE What Fails: Contrastive Anchored-REflection for Verifiable Multimodal Reasoning

News

Highlights

Method overview

Benchmark snapshot

Repository map

Installation

Quickstart

CARE (Qwen2.5-VL 7B, Geometry3K)

CARE with overrides

CARE configuration

Citation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

[CVPR2026] CARE What Fails: Contrastive Anchored-REflection for Verifiable Multimodal Reasoning

News

Highlights

Method overview

Benchmark snapshot

Repository map

Installation

Quickstart

CARE (Qwen2.5-VL 7B, Geometry3K)

CARE with overrides

CARE configuration

Citation

License

About

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages