You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
feat(paper-review): add model selection guide with vision mode requirement
- Require multimodal/vision-capable models for PDF review
- Add litellm provider prefix table (OpenAI, Anthropic, DashScope)
- Always use --vision flag for PDF review to avoid empty text extraction
- Add auto-selection logic based on available API keys
Made-with: Cursor
-`suspect`: title/author mismatch or not found — manual check recommended
128
128
129
-
## API key by model
129
+
## Model selection
130
130
131
-
| Model prefix | Environment variable |
132
-
|-------------|---------------------|
133
-
|`gpt-*`, `o1-*`, `o3-*`|`OPENAI_API_KEY`|
134
-
|`claude-*`|`ANTHROPIC_API_KEY`|
135
-
|`qwen-*`, `dashscope/*`|`DASHSCOPE_API_KEY`|
136
-
| Custom endpoint |`--api-key` + `--base-url`|
131
+
This pipeline uses [litellm](https://docs.litellm.ai/docs/providers) for model calls.
132
+
The `--model` value must include the **provider prefix** required by litellm.
133
+
134
+
**IMPORTANT: The model MUST support multimodal (vision) input.** PDF review uses vision mode
135
+
(`--vision`) to render pages as images, which requires a vision-capable model. Text-only models
136
+
will fail or produce empty reviews.
137
+
138
+
| Provider | Model flag | Env var | Notes |
139
+
|----------|-----------|---------|-------|
140
+
| OpenAI |`openai/gpt-5.2`, `openai/gpt-5.3`, … |`OPENAI_API_KEY`| Must be a vision-capable model; use the latest available; check [OpenAI models](https://platform.openai.com/docs/models) for current options |
141
+
| Anthropic |`anthropic/claude-opus-4-6-thinking`, … |`ANTHROPIC_API_KEY`| Must be a vision-capable model; use the latest available; check [Anthropic models](https://docs.anthropic.com/en/docs/about-claude/models) for current options |
142
+
| DashScope (Qwen) |`dashscope/qwen3.5-plus`|`DASHSCOPE_API_KEY`| Supports vision; recommended Qwen model |
143
+
| Custom endpoint | any litellm-supported name |`--api_key` + `--base_url`| Must support vision/multimodal input |
144
+
145
+
**If the user does not specify a model**, choose one based on available API keys:
146
+
1.`DASHSCOPE_API_KEY` set → use `dashscope/qwen3.5-plus`
147
+
2.`OPENAI_API_KEY` set → use the latest vision-capable OpenAI model (search web if unsure)
148
+
3.`ANTHROPIC_API_KEY` set → use the latest vision-capable Anthropic model (search web if unsure)
149
+
150
+
**Always add `--vision` flag for PDF review** to enable vision mode. Without it, the pipeline
151
+
uses text extraction which may lose formatting, figures, and tables.
0 commit comments