Skip to content

Commit 920cde1

Browse files
committed
feat(paper-review): add model selection guide with vision mode requirement
- Require multimodal/vision-capable models for PDF review - Add litellm provider prefix table (OpenAI, Anthropic, DashScope) - Always use --vision flag for PDF review to avoid empty text extraction - Add auto-selection logic based on available API keys Made-with: Cursor
1 parent 5f11a80 commit 920cde1

File tree

1 file changed

+23
-8
lines changed

1 file changed

+23
-8
lines changed

skills/paper-review/SKILL.md

Lines changed: 23 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,7 @@ pip install pypdfium2 # only if using vision mode (use_vision_for_pdf=True)
3636
|------|-----------|-------|
3737
| Paper file path | Yes | PDF or .tar.gz/.zip TeX package |
3838
| API key | Yes | Env var preferred: `OPENAI_API_KEY`, `ANTHROPIC_API_KEY`, etc. |
39-
| Model name | No | Default: `gpt-4o`. Claude: `claude-opus-4-5`, Qwen: `qwen-plus` |
39+
| Model name | No | `dashscope/qwen3.5-plus`, `openai/<latest>`, `anthropic/<latest>`. See **Model selection** below |
4040
| Discipline | No | If not given, uses general CS/ML-oriented prompts |
4141
| Venue | No | e.g. `"NeurIPS 2025"`, `"The Lancet"` |
4242
| Instructions | No | Free-form reviewer guidance, e.g. `"Focus on experimental design"` |
@@ -126,14 +126,29 @@ python -m cookbooks.paper_review --bib_only references.bib --email your@email.co
126126
- `verified`: found in CrossRef/arXiv/DBLP
127127
- `suspect`: title/author mismatch or not found — manual check recommended
128128

129-
## API key by model
129+
## Model selection
130130

131-
| Model prefix | Environment variable |
132-
|-------------|---------------------|
133-
| `gpt-*`, `o1-*`, `o3-*` | `OPENAI_API_KEY` |
134-
| `claude-*` | `ANTHROPIC_API_KEY` |
135-
| `qwen-*`, `dashscope/*` | `DASHSCOPE_API_KEY` |
136-
| Custom endpoint | `--api-key` + `--base-url` |
131+
This pipeline uses [litellm](https://docs.litellm.ai/docs/providers) for model calls.
132+
The `--model` value must include the **provider prefix** required by litellm.
133+
134+
**IMPORTANT: The model MUST support multimodal (vision) input.** PDF review uses vision mode
135+
(`--vision`) to render pages as images, which requires a vision-capable model. Text-only models
136+
will fail or produce empty reviews.
137+
138+
| Provider | Model flag | Env var | Notes |
139+
|----------|-----------|---------|-------|
140+
| OpenAI | `openai/gpt-5.2`, `openai/gpt-5.3`, … | `OPENAI_API_KEY` | Must be a vision-capable model; use the latest available; check [OpenAI models](https://platform.openai.com/docs/models) for current options |
141+
| Anthropic | `anthropic/claude-opus-4-6-thinking`, … | `ANTHROPIC_API_KEY` | Must be a vision-capable model; use the latest available; check [Anthropic models](https://docs.anthropic.com/en/docs/about-claude/models) for current options |
142+
| DashScope (Qwen) | `dashscope/qwen3.5-plus` | `DASHSCOPE_API_KEY` | Supports vision; recommended Qwen model |
143+
| Custom endpoint | any litellm-supported name | `--api_key` + `--base_url` | Must support vision/multimodal input |
144+
145+
**If the user does not specify a model**, choose one based on available API keys:
146+
1. `DASHSCOPE_API_KEY` set → use `dashscope/qwen3.5-plus`
147+
2. `OPENAI_API_KEY` set → use the latest vision-capable OpenAI model (search web if unsure)
148+
3. `ANTHROPIC_API_KEY` set → use the latest vision-capable Anthropic model (search web if unsure)
149+
150+
**Always add `--vision` flag for PDF review** to enable vision mode. Without it, the pipeline
151+
uses text extraction which may lose formatting, figures, and tables.
137152

138153
## Additional resources
139154

0 commit comments

Comments
 (0)