“Rewrite-then-Edit” two-stage pipelines & system-prompt leaks: do they systematically boost RISEBench scores?

Congratulation for Neurips Oral work. But I still have some question. Several models on your leaderboard (e.g., Gemini-2.5-Flash-Image, Seedream, BAGEL w/ CoT) appear to run a two-stage pipeline: an MLLM first rewrites/structures the edit instruction (often with CoT), then a second stage performs the actual image edit. The community has also shown system-prompt leaks from some closed models suggesting this “rewrite-then-generate” template like Seedream and nano banana. 

Questions / requests for clarification:
1.	If we apply a strong, neutral MLLM rewriter (e.g., GPT-5 / Gemini / Qwen-VL) to all RISEBench prompts before feeding them to open-source editors (e.g., FLUX.1-Kontext-dev, Qwen-Image-Edit), do scores jump substantially?
2.	Does the leaderboard distinguish direct instruction vs rewrite-then-edit settings? Could you provide a reproducible toggle and a sub-leaderboard to avoid methodological apples-to-oranges?
3.	Would you consider an official Prompt-Rewrite Protocol (with a fixed rewriter + template) so we can attribute gains between “instruction enhancement” and the “editor’s intrinsic capability”?

Thanks—clear guidance here would be helpful

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

“Rewrite-then-Edit” two-stage pipelines & system-prompt leaks: do they systematically boost RISEBench scores? #7

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

“Rewrite-then-Edit” two-stage pipelines & system-prompt leaks: do they systematically boost RISEBench scores? #7

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions