中文版 README | English
🌙 Let Claude Code do research while you sleep. Wake up to find your paper scored, weaknesses identified, experiments run, and narrative rewritten — autonomously.
Custom Claude Code skills for autonomous ML research workflows. These skills orchestrate cross-model collaboration — Claude Code drives the research while an external LLM (via Codex MCP) acts as a critical reviewer. 🔀 Also supports alternative model combinations (e.g., GLM + GPT, GLM + MiniMax) — no Claude API required.
💭 Why not self-play with a single model? Using Claude Code subagents or agent teams for both execution and review is technically possible, but tends to fall into local minima — the same model reviewing its own patterns creates blind spots.
Think of it like adversarial vs. stochastic bandits: a single model self-reviewing is the stochastic case (predictable reward noise), while cross-model review is adversarial (the reviewer actively probes weaknesses the executor didn't anticipate) — and adversarial bandits are fundamentally harder to game.
💭 Why two models, not more? Two is the minimum needed to break self-play blind spots, and 2-player games converge to Nash equilibrium far more efficiently than n-player ones. Adding more reviewers increases API cost and coordination overhead with diminishing returns — the biggest gain is going from 1→2, not 2→4.
Claude Code's strength is fast, fluid execution; Codex (GPT-5.4 xhigh) is slower but more deliberate and rigorous in critique. These complementary styles — speed × rigor — produce better outcomes than either model talking to itself.
- 2026-03-14 — 📱 Feishu/Lark integration: three modes (off/push/interactive), mobile notifications for experiments, reviews, and checkpoints
- 2026-03-13 — 🛑 Human-in-the-loop: configurable
AUTO_PROCEEDcheckpoints across all workflows. Full autopilot or step-by-step approval - 2026-03-12 — 🔗 Zotero + Obsidian + local PDFs + arXiv/Scholar: multi-source literature search with cross-model novelty verification
- 2026-03-11 — 🚀 Three end-to-end workflows complete: one prompt → top-venue-style paper.
/research-pipelinechains idea discovery → auto review → paper writing autonomously - 2026-03-09 — 📝
/paper-writingworkflow: narrative report → structured outline → figures → LaTeX → compiled PDF → 2-round auto-improvement (4/10 → 8.5/10)
# 1. Install skills
git clone https://github.com/wanshuiyin/Auto-claude-code-research-in-sleep.git
cp -r Auto-claude-code-research-in-sleep/skills/* ~/.claude/skills/
# 2. Set up Codex MCP (for review skills)
npm install -g @openai/codex
claude mcp add codex -s user -- codex mcp-server
# 3. Use in Claude Code
claude
> /idea-discovery "your research direction" # Workflow 1: literature → brainstorm → validate
> /auto-review-loop # Workflow 2: review → fix → re-review overnight
> /paper-writing "NARRATIVE_REPORT.md" # Workflow 3: narrative → polished PDF
> /research-pipeline "your research direction" # Full pipeline: Workflow 1 → 2 → 3 end-to-endTip: Workflows pause at checkpoints for your approval by default. Add
AUTO_PROCEED=trueto run fully autonomously (great for overnight runs).
See full setup guide for details and alternative model combinations if you don't have Claude/OpenAI API.
-
📊 18 composable skills — mix and match, or chain into full pipelines (
/idea-discovery,/auto-review-loop,/paper-writing,/research-pipeline) -
🔍 Literature & novelty — multi-source paper search (Zotero + Obsidian + local PDFs + arXiv/Scholar) + cross-model novelty verification
-
💡 Idea discovery — literature survey → brainstorm 8-12 ideas → novelty check → GPU pilot experiments → ranked report
-
🔄 Auto review loop — 4-round autonomous review, 5/10 → 7.5/10 overnight with 20+ GPU experiments
-
📝 Paper writing — narrative → outline → figures → LaTeX → PDF → auto-review (4/10 → 8.5/10), one command
-
🤖 Cross-model collaboration — Claude Code executes, GPT-5.4 xhigh reviews. Adversarial, not self-play
-
📝 Peer review — review others' papers as a conference reviewer, with structured scoring and meta-review
-
🖥️ GPU deployment — auto rsync, screen sessions, multi-GPU parallel experiments, live monitoring
-
🔀 Flexible models — default Claude × GPT-5.4, also supports GLM + GPT, GLM + MiniMax — no Claude API required
-
🛑 Human-in-the-loop — configurable checkpoints at key decisions.
AUTO_PROCEED=truefor full autopilot,falseto approve each step -
📱 Feishu/Lark notifications — three modes: off (default, strongly recommended for most users), push-only (webhook, mobile alerts), interactive (approve/reject from Feishu). Zero impact when unconfigured
-
🧩 Extensible — domain-specific skills welcome! Add a
SKILL.mdand open a PR. See community skills likedse-loop(architecture/EDA)
A real overnight 4-round run on an ML research project, from borderline reject to submission-ready:
| Round | Score | What Happened |
|---|---|---|
| Initial | 5.0/10 | Borderline reject |
| Round 1 | 6.5/10 | Added standard metrics, discovered metric decoupling |
| Round 2 | 6.8/10 | Key claim failed to reproduce, pivoted narrative |
| Round 3 | 7.0/10 | Large seed study killed main improvement claim |
| Round 4 | 7.5/10 ✅ | Diagnostic evidence solidified, submission ready |
The loop autonomously ran 20+ GPU experiments, rewrote the paper's narrative framing, and killed claims that didn't hold up — all without human intervention.
These skills compose into a full research lifecycle. The three workflows can be used independently or chained together:
- Exploring a new area (e.g., writing a survey)? Start with Workflow 1 →
/idea-discovery - Already have an idea + initial plan? Jump straight to Workflow 2 →
/auto-review-loop - Ready to write the paper? Workflow 3 →
/paper-writing(or step by step:/paper-plan→/paper-figure→/paper-write→/paper-compile→/auto-paper-improvement-loop) - Full pipeline? Workflow 1 → Workflow 2 → Workflow 3 →
/research-pipeline— from literature survey all the way to submission
⚠️ Important: These tools accelerate research, but they don't replace your own critical thinking. Always review generated ideas with your domain expertise, question the assumptions, and make the final call yourself. The best research comes from human insight + AI execution, not full autopilot.
/research-lit → /idea-creator → /novelty-check → implement → /run-experiment → /auto-review-loop → /paper-plan → /paper-figure → /paper-write → /auto-paper-improvement-loop → submit
(survey) (brainstorm) (verify novel) (code) (deploy & run) (review & fix) (outline) (plots) (LaTeX+PDF) (review ×2 + format) (done!)
├──── Workflow 1: Idea Discovery ────┤ ├──── Workflow 2: Auto Loop ────┤ ├──────────────── Workflow 3: Paper Writing ──────────────────┤
📝 Blog post: 梦中科研全流程开源
"What's the state of the art? Where are the gaps?"
Don't have a concrete idea yet? Just give a research direction — /idea-creator handles the rest:
- 📚 Survey the landscape (recent papers, open problems, recurring limitations)
- 🧠 Brainstorm 8-12 concrete ideas via GPT-5.4 xhigh
- 🔍 Filter by feasibility, compute cost, and quick novelty search
- 🛡️ Validate top ideas with deep novelty check + devil's advocate review
- 🧪 Pilot top 2-3 ideas in parallel on different GPUs (30 min - 2 hr each)
- 🏆 Rank by empirical signal — ideas with positive pilot results rise to the top
The output is a ranked IDEA_REPORT.md with hypotheses, pilot results, reviewer objections, and a suggested execution order. Ideas that fail are documented too, saving future dead-end exploration.
┌─────────────────────────────────────────────────────────────┐
│ Idea Discovery │
│ │
│ /research-lit /idea-creator /novelty-check │
│ (find papers) (brainstorm) (verify novelty) │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ Scan │────▶│ Generate │──────▶│ Check if │ │
│ │ local │ │ 8-12 │ │ idea is │ │
│ │ papers + │ │ ideas │ │ novel │ │
│ │ search │ │ + rank │ │ │ │
│ └──────────┘ └──────────┘ └──────────┘ │
│ │ │ │
│ ▼ ▼ │
│ ┌──────────┐ ┌──────────┐ │
│ │ Filter │──────▶│ External │ │
│ │ by cost, │ │ LLM │ │
│ │ novelty │ │ evaluates│ │
│ └──────────┘ └──────────┘ │
│ │
│ Typical flow: │
│ 1. /research-lit "discrete diffusion models" (local → online) │
│ 2. /idea-creator "DLLMs post training" │
│ 3. Review ranked ideas, pick top 2-3 │
│ 4. /novelty-check "top idea" (deep verification) │
│ 5. /research-review "top idea" (critical feedback) │
│ 6. Implement → /run-experiment → /auto-review-loop │
└─────────────────────────────────────────────────────────────┘
Skills involved: research-lit + idea-creator + novelty-check + research-review
💡 One-command shortcut:
/idea-discovery "your research direction"runs this entire workflow automatically.
🔄 Human-in-the-loop: Each phase presents results and waits for your feedback. Not happy? Tell it what's missing — it refines the prompt and regenerates. Trust the defaults? It auto-proceeds with the top-ranked option. You decide how hands-on to be.
⚙️ Pilot experiment budgets (max hours, timeout, GPU budget) are configurable — see Customization.
📝 Blog post: Claude Code 两月 NeurIPS 指北
"Review my paper, fix what's wrong, repeat until it's good."
┌─────────────────────────────────────────────────────────────┐
│ Auto Review Loop │
│ │
│ /research-review /auto-review-loop │
│ (single deep review) (autonomous loop) │
│ │ │ │
│ ▼ ▼ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ External │──▶│ Implement│──▶│ Monitor │──▶ repeat │
│ │ LLM │ │ fixes │ │ results │ until │
│ │ reviews │ │ & run │ │ │ score ≥ 6 │
│ └──────────┘ │ experiments│ └──────────┘ │
│ └──────────┘ │
│ │
│ When reviewer suggests a new method direction: │
│ /novelty-check — verify idea isn't already published │
│ │
│ Supporting skills: │
│ /run-experiment — deploy to local/remote GPU │
│ /analyze-results — interpret experiment outputs │
│ /monitor-experiment — check progress, collect results │
└─────────────────────────────────────────────────────────────┘
Skills involved: auto-review-loop + research-review + novelty-check + run-experiment + analyze-results + monitor-experiment
💡 One-command shortcut:
/auto-review-loop "your paper topic"runs this entire workflow automatically.
🛡️ Key safety features:
- 🔒 MAX_ROUNDS = 4 — prevents infinite loops; stops early if score threshold is met
- ⏱️ > 4 GPU-hour experiments skipped — won't launch massive jobs; flags them for manual follow-up
- 🧠 Prefer reframing over new experiments — when both can address a weakness, chooses the cheaper path
- 🪞 No hiding weaknesses — explicit rule: "Do NOT hide weaknesses to game a positive score"
- 🔧 Fix before re-review — must actually implement fixes before resubmitting; no empty promises
- 💾 Compact recovery — persists state (
REVIEW_STATE.json) after each round. If the context window fills up and auto-compacts mid-loop, the workflow reads the state file and resumes from where it left off — no human intervention needed
⚙️ MAX_ROUNDS, score threshold, and GPU limits are configurable — see Customization.
📝 Blog post: 开源 | 睡觉 Claude 自动跑实验改文
"Turn my research narrative into a submission-ready PDF." Requires a local LaTeX environment — see Prerequisites.
┌─────────────────────────────────────────────────────────────┐
│ Paper Writing Pipeline │
│ │
│ /paper-plan /paper-figure /paper-write │
│ (outline) (plots & tables) (LaTeX draft) │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ Claims- │───▶│ Generate │────▶│ Section │──┐ │
│ │ Evidence │ │ figures, │ │ by │ │ │
│ │ Matrix + │ │ tables, │ │ section │ │ │
│ │ Section │ │ LaTeX │ │ LaTeX │ │ │
│ │ Plan │ │ includes │ │ draft │ │ │
│ └──────────┘ └──────────┘ └──────────┘ │ │
│ │ │ │
│ │ /paper-compile │ │
│ │ (build PDF) │ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌──────────────────────────────────────────────────┐ │
│ │ NARRATIVE_REPORT.md ──► PAPER_PLAN.md ──► paper/ │ │
│ │ (input) (outline) (LaTeX+PDF)│ │
│ └──────────────────────────────────────────────────┘ │
│ │
│ Typical flow: │
│ 1. Write NARRATIVE_REPORT.md (from Workflow 2 results) │
│ 2. /paper-plan (claims-evidence matrix + section plan) │
│ 3. /paper-figure (comparison tables, training curves, etc.) │
│ 4. /paper-write (section-by-section LaTeX generation) │
│ 5. /paper-compile (build PDF, fix errors, page check) │
│ 6. /auto-paper-improvement-loop (review ×2 + format check) │
└─────────────────────────────────────────────────────────────┘
Skills involved: paper-plan + paper-figure + paper-write + paper-compile + auto-paper-improvement-loop
One-command shortcut:
/paper-writing "NARRATIVE_REPORT.md"runs this entire workflow automatically.
Input: A NARRATIVE_REPORT.md describing the research: claims, experiments, results, figures. The more detailed the narrative (especially figure descriptions and quantitative results), the better the output.
Output: A submission-ready paper/ directory with LaTeX source, clean .bib (only cited entries), and compiled PDF.
Key features:
- 📐 Claims-Evidence Matrix — every claim maps to evidence, every experiment supports a claim
- 📊 Auto figure generation — line plots, bar charts, comparison tables from JSON data
- 🧹 Clean bib — automated filtering removes uncited entries (948→215 lines in testing)
- 📄 Flexible sections — 5-8 sections depending on paper type (theory papers often need 7)
- 🔍 GPT-5.4 review — each step optionally reviewed by external LLM
- ✂️ De-AI polish — removes AI writing patterns (delve, pivotal, landscape...)
- 🎯 Page verification —
pdftotext-based precise check that main body fits page limit
⚠️ What/paper-figurecan and cannot do: It auto-generates data-driven plots (training curves, bar charts, heatmaps) and comparison tables (LaTeX) from JSON/CSV data. It cannot generate architecture diagrams, pipeline figures, model diagrams, or grids of generated images — these must be created manually (e.g., draw.io, Figma, TikZ) and placed infigures/before running/paper-write. In a typical ML paper, ~60% of figures are auto-generated and ~40% are manual.
Tested end-to-end: Generated a 9-page ICLR 2026 theory paper (7 sections, 29 citations, 4 figures, 2 comparison tables) from a single NARRATIVE_REPORT.md — zero compilation errors, zero undefined references.
After Workflow 3 generates the paper, /auto-paper-improvement-loop runs 2 rounds of GPT-5.4 xhigh content review → fix → recompile, plus a final format compliance check, autonomously polishing the paper from rough draft to submission-ready.
Score Progression (Real Test — ICLR 2026 theory paper):
| Round | Score | Key Changes |
|---|---|---|
| Round 0 | 4/10 (content) | Baseline |
| Round 1 | 6/10 (content) | Fixed assumptions, softened claims, renamed notation |
| Round 2 | 7/10 (content) | Added synthetic validation, stronger limitations |
| Round 3 | 5→8.5/10 (format) | Removed hero fig, appendix, compressed conclusion, float spacing |
Final: 8 pages main body (ICLR limit: 9), 0 overfull hbox, ICLR-compliant. +4.5 points across 3 rounds.
Round 1 fixes (6 items)
- CRITICAL — Assumption-model mismatch: A boundedness assumption contradicted the model's distributional family. Replaced with a tail-compatible assumption and added formal truncation bridge.
- CRITICAL — Theory-practice gap: Theory assumes idealized encoders, experiments use learned nonlinear encoders. Softened "validate" → "demonstrate practical relevance" and added explicit disclaimer.
- MAJOR — Missing quantitative metrics: Added parameter count table (latent vs total) with honest accounting of system cost.
- MAJOR — Theorem not self-contained: Added "Interpretation" paragraph listing all dependencies explicitly.
- MAJOR — Overclaim in novelty statement: Scoped a broad "first convergence guarantee" to precise conditions under which it holds.
- MAJOR — Notation confusion: Renamed a symbol that clashed with another key variable. Added Notation paragraph.
Round 2 fixes (4 items)
- MAJOR — Missing theory-aligned experiments: Added a synthetic validation subsection directly testing the two main theoretical predictions under controlled conditions.
- MAJOR — Overclaim softening: Replaced strong equivalence claims with appropriately hedged language across all files.
- MAJOR — Informal theoretical argument: Formalized an informal justification into a proper proposition with explicit error bounds.
- MINOR — Weak limitations: Expanded to explicitly list all assumptions and acknowledge missing standard evaluations.
Round 3 format fixes (8 items)
- Removed hero figure block (saved ~0.7 pages)
- Compressed conclusion from 15→9 lines
- Moved synthetic validation to Appendix A
- Moved comparison tables to Appendix B
- Fixed overfull hbox (85pt) with
\resizebox - Added compact float spacing (
\captionsetup,\textfloatsep) - Inlined centered question block in introduction
- Tightened
itemizeenvironments
| Skill | Description | Needs Codex MCP? |
|---|---|---|
💡 idea-creator |
Generate and rank research ideas given a broad direction (brainstorm + filter + validate) | Yes |
🔬 research-review |
Single-round deep review from external LLM (xhigh reasoning) | Yes |
🔁 auto-review-loop |
Autonomous multi-round review→fix→re-review loop (max 4 rounds) | Yes |
📚 research-lit |
Scan Zotero + Obsidian + local PDFs + web search, analyze related work, find gaps | No (Optional: Zotero/Obsidian MCP) |
📊 analyze-results |
Analyze experiment results, compute statistics, generate insights | No |
👀 monitor-experiment |
Monitor running experiments, check progress, collect results | No |
🔍 novelty-check |
Verify research idea novelty against recent literature before implementing | Yes |
🚀 run-experiment |
Deploy experiments to local (MPS/CUDA) or remote GPU servers | No |
🎨 pixel-art |
Generate pixel art SVG illustrations for READMEs, docs, or slides | No |
🔭 idea-discovery |
Workflow 1 pipeline: research-lit → idea-creator → novelty-check → research-review | Yes |
🏗️ research-pipeline |
Full pipeline: Workflow 1 → implement → Workflow 2 → Workflow 3, from direction to submission | Yes |
📐 paper-plan |
Generate paper outline with claims-evidence matrix, figure plan, and citation scaffolding | Yes |
📊 paper-figure |
Publication-quality matplotlib/seaborn plots from experiment data, with LaTeX snippets | Optional |
✍️ paper-write |
Section-by-section LaTeX generation with ICLR/NeurIPS/ICML templates | Yes |
🔨 paper-compile |
Compile LaTeX to PDF, auto-fix errors, submission readiness checks | No |
🔄 auto-paper-improvement-loop |
2-round content review + format check loop on generated paper (4/10 → 8.5/10) | Yes |
📝 paper-writing |
Workflow 3 pipeline: paper-plan → paper-figure → paper-write → paper-compile → auto-paper-improvement-loop | Yes |
📱 feishu-notify |
Feishu/Lark notifications — push (webhook) or interactive (bidirectional). Off by default | No |
| Community Skills | Domain-specific skills contributed by the community | |
🏗️ dse-loop |
Autonomous design space exploration — iteratively run, analyze, and tune parameters. Built for architecture/EDA (gem5, Yosys), but works for any domain with tunable parameters (comp chem, CFD, bioinformatics, etc.) | No |
- Claude Code installed
- (For review skills) Codex CLI installed and configured as MCP server:
npm install -g @openai/codex claude mcp add codex -s user -- codex mcp-server
- (For Workflow 3: paper writing) LaTeX environment with
latexmkandpdfinfo:# macOS brew install --cask mactex # or: brew install basictex brew install poppler # provides pdfinfo # Ubuntu/Debian sudo apt install texlive-full latexmk poppler-utils # Verify latexmk --version && pdfinfo -v
If you only need Workflow 1 & 2 (idea discovery + auto review), LaTeX is not required.
git clone https://github.com/wanshuiyin/Auto-claude-code-research-in-sleep.git
cd Auto-claude-code-research-in-sleep
# Install all skills globally
cp -r skills/* ~/.claude/skills/
# Or install specific skills
cp -r skills/auto-review-loop ~/.claude/skills/
cp -r skills/research-lit ~/.claude/skills/# Workflow 1: Idea Discovery
> /idea-discovery "your research direction" # full pipeline
> /research-lit "topic" # just literature survey (all sources)
> /research-lit "topic" — sources: zotero, web # mix and match sources
> /idea-creator "topic" # just brainstorm
# Workflow 2: Auto Research Loop
> /auto-review-loop "your paper topic" # review → fix → repeat
> /research-review "your paper" # single deep review
# Workflow 3: Paper Writing
> /paper-writing "NARRATIVE_REPORT.md" # full pipeline
> /paper-plan "NARRATIVE_REPORT.md" # just outline
> /paper-compile "paper/" # just compile
# Full Pipeline
> /research-pipeline "your research direction" # Workflow 1 → 2 → 3 end-to-end
# Supporting Skills
> /run-experiment train.py --lr 1e-4 --epochs 100
> /analyze-results figures/*.json
> /monitor-experiment server5
To run the auto-review loop without clicking permission prompts, add to .claude/settings.local.json:
{
"permissions": {
"allow": [
"mcp__codex__codex",
"mcp__codex__codex-reply",
"Write",
"Edit",
"Skill(auto-review-loop)"
]
}
}When GPT-5.4 says "run an ablation study" or "add a baseline comparison", Claude Code automatically writes the experiment script and deploys it to your GPU server. For this to work, Claude Code needs to know your server environment.
Add your server info to your project's CLAUDE.md:
## Remote Server
- SSH: `ssh my-gpu-server` (key-based auth, no password)
- GPU: 4x A100
- Conda env: `research` (Python 3.10 + PyTorch)
- Activate: `eval "$(/opt/conda/bin/conda shell.bash hook)" && conda activate research`
- Code directory: `/home/user/experiments/`
- Use `screen` for background jobs: `screen -dmS exp0 bash -c '...'`Claude Code reads this and knows how to SSH in, activate the environment, and launch experiments. GPT-5.4 (the reviewer) only decides what experiments to run — Claude Code figures out how based on your CLAUDE.md.
No server? The review and rewriting skills still work without GPU access. Only experiment-related fixes will be skipped (flagged for manual follow-up).
If you use Zotero to manage your paper library, /research-lit can search your collections, read your annotations/highlights, and export BibTeX — all before searching the web.
Recommended: zotero-mcp (1.8k⭐, semantic search, PDF annotations, BibTeX export)
# Install
uv tool install zotero-mcp-server # or: pip install zotero-mcp-server
# Add to Claude Code (Local API — requires Zotero desktop running)
claude mcp add zotero -s user -- zotero-mcp -e ZOTERO_LOCAL=true
# Or use Web API (works without Zotero running)
claude mcp add zotero -s user -- zotero-mcp \
-e ZOTERO_API_KEY=your_key -e ZOTERO_USER_ID=your_idGet your API key at https://www.zotero.org/settings/keys
What it enables in /research-lit:
- 🔍 Search your Zotero library by topic (including semantic/vector search)
- 📂 Browse collections and tags
- 📝 Read your PDF annotations and highlights (what you personally found important)
- 📄 Export BibTeX for direct use in paper writing
Not using Zotero? No problem — /research-lit automatically skips Zotero and uses local PDFs + web search instead.
If you use Obsidian for research notes, /research-lit can search your vault for paper summaries, tagged references, and your own insights.
Recommended: mcpvault (760⭐, no Obsidian app needed, 14 tools, BM25 search)
# Add to Claude Code (point to your vault path)
claude mcp add obsidian-vault -s user -- npx @bitbonsai/mcpvault@latest /path/to/your/vaultOptional complement: obsidian-skills (13.6k⭐, by Obsidian CEO) — teaches Claude to understand Obsidian-specific Markdown (wikilinks, callouts, properties). Copy to your vault:
git clone https://github.com/kepano/obsidian-skills.git
cp -r obsidian-skills/.claude /path/to/your/vault/What it enables in /research-lit:
- 🔍 Search your vault for notes on the research topic
- 🏷️ Find notes by tags (e.g.,
#paper-review,#diffusion-models) - 📝 Read your processed summaries and insights (more valuable than raw papers)
- 🔗 Follow wikilinks to discover related notes
Not using Obsidian? No problem — /research-lit automatically skips Obsidian and works as before.
💡 Zotero + Obsidian together: Many researchers use Zotero for paper storage and Obsidian for notes. Both integrations work simultaneously —
/research-litchecks Zotero first (raw papers + annotations), then Obsidian (your processed notes), then local PDFs, then web search.
Get mobile notifications when experiments finish, reviews score, or checkpoints need your input — without sitting in front of the terminal.
| Push Only (group cards) | Interactive (private chat) |
|---|---|
![]() |
![]() |
Three modes — you choose per-project:
| Mode | What happens | You need |
|---|---|---|
| Off (default) | Nothing. Pure CLI, no Feishu | Nothing |
| Push only | Webhook notifications at key events. Mobile push, no reply | Feishu bot webhook URL |
| Interactive | Full bidirectional. Approve/reject ideas, reply to checkpoints from Feishu | feishu-claude-code running |
Push Only Setup (5 min)
Group notifications with rich cards — experiment done, review scored, pipeline complete. Mobile push, no reply needed.
Step 1: Create a Feishu group bot
- Open your Feishu group (or create a test group)
- Group Settings → Bots → Add Bot → Custom Bot
- Name it (e.g.,
ARIS Notifications), copy the Webhook URL - Security: add custom keyword
ARIS(all notifications include this word), or leave unrestricted
Step 2: Create config file
cat > ~/.claude/feishu.json << 'EOF'
{
"mode": "push",
"webhook_url": "https://open.feishu.cn/open-apis/bot/v2/hook/YOUR_WEBHOOK_ID"
}
EOFStep 3: Test it
curl -s -X POST "YOUR_WEBHOOK_URL" \
-H "Content-Type: application/json" \
-d '{
"msg_type": "interactive",
"card": {
"header": {"title": {"tag": "plain_text", "content": "🧪 ARIS Test"}, "template": "blue"},
"elements": [{"tag": "markdown", "content": "Push mode working! 🎉"}]
}
}'You should see a blue card in your group. Skills will now automatically send rich cards at key events:
| Event | Card color | Content |
|---|---|---|
| Review scored ≥ 6 | 🟢 Green | Score, verdict, top weaknesses |
| Review scored < 6 | 🟠 Orange | Score, verdict, action items |
| Experiment complete | 🟢 Green | Results table, delta vs baseline |
| Checkpoint waiting | 🟡 Yellow | Question, options, context |
| Error | 🔴 Red | Error message, suggested fix |
| Pipeline done | 🟣 Purple | Score progression, deliverables |
Interactive Setup (15 min)
Everything Push mode does, plus bidirectional private chat with Claude Code via Feishu. Approve/reject ideas, reply to checkpoints, give custom instructions — all from your phone.
How it works: Push cards go to the group (everyone sees status). Interactive conversations happen in private chat with the bot (you reply, Claude Code acts on it).
Step 1: Complete Push setup above first (you'll keep both)
Step 2: Create a Feishu app on open.feishu.cn
- Click Create Enterprise App → name it (e.g.,
ARIS Claude Bot) → create - Left menu → Add Capabilities → check Bot
- Left menu → Permissions → search and enable these 5 permissions:
| Permission | Scope | Why |
|---|---|---|
im:message |
Send & receive messages | Core messaging |
im:message:send_as_bot |
Send as bot | Bot replies |
im:message.group_at_msg:readonly |
Receive group @mentions | Group messages |
im:message.p2p_msg:readonly |
Receive private messages | |
im:resource |
Access attachments | Images/files |
- Left menu → Events & Callbacks → select Long Connection mode → add event:
im.message.receive_v1→ save
⚠️ Important: The "Long Connection" page may show "未检测到应用连接信息" — this is normal. You need to start the bridge first (Step 3), then come back and save.
- Left menu → Version Management → Create Version → fill description → Submit for Review
For personal/test Feishu organizations, approval is usually instant.
Step 3: Deploy the bridge
git clone https://github.com/joewongjc/feishu-claude-code.git
cd feishu-claude-code
python3 -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
# Configure
cp .env.example .envEdit .env:
FEISHU_APP_ID=cli_your_app_id # From app credentials page
FEISHU_APP_SECRET=your_app_secret # From app credentials page
DEFAULT_MODEL=claude-opus-4-6 # ⚠️ Default is sonnet — change to opus for best results
DEFAULT_CWD=/path/to/your/project # Working directory for Claude Code
PERMISSION_MODE=bypassPermissions # Or "default" for safer mode
⚠️ Model matters: The defaultclaude-sonnet-4-6works but may struggle with complex project context.claude-opus-4-6correctly identified 18 ARIS skills on first try where sonnet could not.
Start the bridge:
python main.py
# Expected output:
# ✅ 连接飞书 WebSocket 长连接(自动重连)...
# [Lark] connected to wss://msg-frontier.feishu.cn/ws/v2?...For long-running use, put it in a screen session:
screen -dmS feishu-bridge bash -c 'cd /path/to/feishu-claude-code && source .venv/bin/activate && python main.py'Step 4: Save event config — Go back to Feishu Open Platform → Events & Callbacks → the long connection should now show "已检测到连接" → Save
If you published the app version before the bridge was running, you may need to create a new version (e.g., 1.0.1) and re-publish after saving event config.
Step 5: Test private chat
- In Feishu, find the bot in your contacts (search by app name)
- Send it a message:
你好 - It should reply via Claude Code
If the bot doesn't reply: Send /new to reset the session, then try again. Common issues:
| Symptom | Cause | Fix |
|---|---|---|
| Bot connects but never receives messages | Missing im:message.p2p_msg:readonly permission |
Add permission → create new version → publish |
| Bot replies but doesn't know your project | DEFAULT_CWD points to wrong directory |
Edit .env → restart bridge |
| Bot replies but seems less capable | Using claude-sonnet-4-6 |
Change to claude-opus-4-6 in .env → restart |
| Old session has stale context | Session cached from before config change | Send /new in chat to start fresh session |
| "未检测到应用连接信息" when saving events | Bridge not running yet | Start bridge first, then save event config |
Step 6: Update ARIS config
cat > ~/.claude/feishu.json << 'EOF'
{
"mode": "interactive",
"webhook_url": "https://open.feishu.cn/open-apis/bot/v2/hook/YOUR_WEBHOOK_ID",
"interactive": {
"bridge_url": "http://localhost:5000",
"timeout_seconds": 300
}
}
EOFNow skills will:
- Push rich cards to the group (status notifications, everyone sees)
- Private chat you for decisions (checkpoints, approve/reject, custom instructions)
| Skill | Events | Push | Interactive |
|---|---|---|---|
/auto-review-loop |
Review scored (each round), loop complete | Score + verdict | + wait for continue/stop |
/auto-paper-improvement-loop |
Review scored, all rounds done | Score progression | Score progression |
/run-experiment |
Experiments deployed | GPU assignment + ETA | GPU assignment + ETA |
/monitor-experiment |
Results collected | Results table | Results table |
/idea-discovery |
Phase transitions, final report | Summary at each phase | + approve/reject at checkpoints |
/research-pipeline |
Stage transitions, pipeline done | Stage summary | + approve/reject |
Not using Feishu? No problem — without ~/.claude/feishu.json, all skills behave exactly as before. Zero overhead, zero side effects.
💡 Alternative IM platforms: The push-only webhook pattern works with any service that accepts incoming webhooks (Slack, Discord, DingTalk, WeChat Work). Just change the
webhook_urland card format infeishu-notify/SKILL.md. For bidirectional support, see cc-connect (multi-platform bridge) or clawdbot-feishu.
Skills are plain Markdown files. Fork and customize:
| Constant | Default | Description |
|---|---|---|
MAX_ROUNDS |
4 | Maximum review→fix→re-review iterations |
POSITIVE_THRESHOLD |
6/10 | Score at which the loop stops (submission-ready) |
> 4 GPU-hour skip |
4h | Experiments exceeding this are flagged for manual follow-up |
| Constant | Default | Description |
|---|---|---|
PILOT_MAX_HOURS |
2h | Skip any pilot estimated to take longer per GPU |
PILOT_TIMEOUT_HOURS |
3h | Hard timeout — kill runaway pilots, collect partial results |
MAX_PILOT_IDEAS |
3 | Maximum number of ideas to pilot in parallel |
MAX_TOTAL_GPU_HOURS |
8h | Total GPU budget across all pilots |
AUTO_PROCEED |
true | Auto-continue with top-ranked option if user doesn't respond. Set false to always wait for explicit approval |
Override inline: /idea-discovery "topic" — pilot budget: 4h per idea, wait for my approval at each step
| Constant | Default | Description |
|---|---|---|
PAPER_LIBRARY |
papers/, literature/ |
Local directories to scan for PDFs before searching online |
MAX_LOCAL_PAPERS |
20 | Max local PDFs to scan (first 3 pages each) |
Override inline: /research-lit "topic" — paper library: ~/Zotero/storage/
| Constant | Default | Description |
|---|---|---|
REVIEWER_MODEL |
gpt-5.4 |
OpenAI model used via Codex MCP. Options: gpt-5.4, o3, gpt-4o, etc. |
- Prompt templates — tailor the review persona and evaluation criteria
allowed-tools— restrict or expand what each skill can do
Don't have Claude / OpenAI API access? You can swap in other models — same cross-model architecture, different providers.
| Role | Default | Alt A: GLM + GPT | Alt B: GLM + MiniMax |
|---|---|---|---|
| Executor (Claude Code) | Claude Opus/Sonnet | GLM-5 (ZhiPu API) | GLM-5 (ZhiPu API) |
| Reviewer (Codex MCP) | GPT-5.4 | GPT-5.4 (OpenAI API) | MiniMax-M2.5 (MiniMax API) |
| Need OpenAI API? | Yes | Yes | No |
npm install -g @anthropic-ai/claude-code
npm install -g @openai/codexOpen with: nano ~/.claude/settings.json
Alt A: GLM (executor) + GPT (reviewer) — Only replace Claude, keep GPT-5.4 as reviewer
{
"env": {
"ANTHROPIC_AUTH_TOKEN": "your_zai_api_key",
"ANTHROPIC_BASE_URL": "https://api.z.ai/api/anthropic",
"API_TIMEOUT_MS": "3000000",
"ANTHROPIC_DEFAULT_HAIKU_MODEL": "glm-4.5-air",
"ANTHROPIC_DEFAULT_SONNET_MODEL": "glm-4.7",
"ANTHROPIC_DEFAULT_OPUS_MODEL": "glm-5"
},
"mcpServers": {
"codex": {
"command": "/opt/homebrew/bin/codex",
"args": [
"mcp-server"
]
}
}
}Codex CLI uses your existing OPENAI_API_KEY (from ~/.codex/config.toml or environment) — no extra config needed for the reviewer side.
Alt B: GLM (executor) + MiniMax (reviewer) — No Claude or OpenAI API needed
{
"env": {
"ANTHROPIC_AUTH_TOKEN": "your_zai_api_key",
"ANTHROPIC_BASE_URL": "https://api.z.ai/api/anthropic",
"API_TIMEOUT_MS": "3000000",
"ANTHROPIC_DEFAULT_HAIKU_MODEL": "glm-4.5-air",
"ANTHROPIC_DEFAULT_SONNET_MODEL": "glm-4.7",
"ANTHROPIC_DEFAULT_OPUS_MODEL": "glm-5",
"CODEX_API_KEY": "your_minimax_api_key",
"CODEX_API_BASE": "https://api.minimax.chat/v1/",
"CODEX_MODEL": "MiniMax-M2.5"
},
"mcpServers": {
"codex": {
"command": "/opt/homebrew/bin/codex",
"args": [
"mcp-server"
]
}
}
}Save: Ctrl+O → Enter → Ctrl+X
git clone https://github.com/wanshuiyin/Auto-claude-code-research-in-sleep.git
cd Auto-claude-code-research-in-sleep
cp -r skills/* ~/.claude/skills/
# Launch Claude Code (now powered by GLM)
claude🔴 Do NOT skip this step. GLM's prompt handling differs from Claude's. You must let GLM read through the project once to ensure skills are correctly parsed.
After launching claude, run in the conversation:
Read through this project and verify all skills are working:
/idea-creator, /research-review, /auto-review-loop, /novelty-check,
/idea-discovery, /research-pipeline, /research-lit, /run-experiment,
/analyze-results, /monitor-experiment, /pixel-art
For each skill, confirm: (1) it loads without errors, (2) the frontmatter is parsed correctly.
This lets GLM (acting as Claude Code) familiarize itself with the skill files and catch any compatibility issues upfront — rather than discovering them mid-workflow when it's expensive to fail.
⚠️ Note: Alternative models may behave differently from Claude and GPT-5.4. You may need to adjustREVIEWER_MODELin the skills and tune prompt templates for best results. The core cross-model architecture remains the same.
- Human-in-the-loop checkpoints — idea-discovery and research-pipeline pause at key decision points for user approval. Configurable via
AUTO_PROCEED(default: auto-continue; setfalseto always wait) - Alternative model combinations — GLM + GPT, GLM + MiniMax fully documented with setup guides. No Claude or OpenAI API required
- Workflow 3: Paper Writing Pipeline — full chain:
/paper-plan→/paper-figure→/paper-write→/paper-compile. ICLR/NeurIPS/ICML templates, claims-evidence matrix, publication-quality figures, latexmk auto-fix. Inspired by claude-scholar, Research-Paper-Writing-Skills, baoyu-skills
Show 6 more completed items
-
Configurable REVIEWER_MODEL — all Codex-dependent skills support custom reviewer model (default
gpt-5.4, also works witho3,gpt-4o, etc.) -
Local paper library scanning —
/research-litscans localpapers/andliterature/directories before external search, leveraging papers you've already read -
Idea Discovery pipeline —
/idea-discoveryorchestrates research-lit → idea-creator → novelty-check → research-review in one command, with pilot experiments on GPU -
Full research pipeline —
/research-pipelinechains Workflow 1 (idea discovery) → implementation → Workflow 2 (auto-review-loop) end-to-end -
Peer review skill —
/peer-reviewfor reviewing others' papers as a conference reviewer, with GPT-5.4 meta-review -
Cross-model collaboration — Claude Code (executor) × Codex GPT-5.4 xhigh (reviewer) architecture, avoiding single-model self-play local minima
- Feishu/Lark integration — three modes (off/push/interactive), configurable via
~/.claude/feishu.json. Push-only needs just a webhook URL; interactive uses feishu-claude-code. Off by default — zero impact on existing workflows. See setup guide - W&B integration — pull training curves and metrics from Weights & Biases as feedback signal. Auto-review-loop can read loss/accuracy plots to diagnose training issues and suggest next experiments
- Related projects: wandb-mcp-server (official W&B MCP, if available), or via
wandb apiCLI
- Related projects: wandb-mcp-server (official W&B MCP, if available), or via
- Zotero MCP integration —
/research-litsearches Zotero collections, reads annotations/highlights, exports BibTeX. Recommended: zotero-mcp (1.8k⭐). See setup guide - Obsidian integration —
/research-litsearches Obsidian vault for research notes, tagged references, wikilinks. Recommended: mcpvault (760⭐) + obsidian-skills (13.6k⭐). See setup guide - More executor × reviewer combinations (Gemini, DeepSeek, etc.)
Domain-specific skills welcome! The core skills cover general research workflows, but every field has its own tools and patterns. We welcome PRs that add new skills for your domain — EDA, bioinformatics, robotics, HPC, or anything else. Just add a skills/your-skill/SKILL.md and open a PR. See dse-loop for an example.
Join the WeChat group for discussion on Claude Code + AI-driven research workflows:
This project builds on and integrates with many excellent open-source projects:
Core Infrastructure
- Claude Code — Anthropic's CLI for Claude, the execution backbone
- Codex CLI — OpenAI's CLI, used as MCP server for cross-model review
Zotero Integration (setup guide)
- zotero-mcp — Zotero MCP server with semantic search and PDF annotations
- Zotero — Open-source reference manager
Obsidian Integration (setup guide)
- mcpvault — Obsidian vault MCP server (no app required)
- obsidian-skills — Claude Code skills for Obsidian Markdown by Steph Ango (Obsidian CEO)
Paper Writing Inspiration
- claude-scholar — Academic paper writing with Claude
- Research-Paper-Writing-Skills — Paper writing skill templates
- baoyu-skills — Claude Code skills collection
Feishu/Lark Integration (setup guide)
- feishu-claude-code — Bidirectional Feishu ↔ Claude Code bridge
- clawdbot-feishu — Feishu bot for Claude
- cc-connect — Multi-platform messaging bridge
- lark-openapi-mcp — Official Lark MCP server
Community
- awesome-agent-skills — Curated list of Claude Code skills (featured)
MIT



