Auto-claude-code-research-in-sleep (ARIS ⚔️)

🌙 Let Claude Code do research while you sleep. Wake up to find your paper scored, weaknesses identified, experiments run, and narrative rewritten — autonomously.

· 💬 Join Community

Custom Claude Code skills for autonomous ML research workflows. These skills orchestrate cross-model collaboration — Claude Code drives the research while an external LLM (via Codex MCP) acts as a critical reviewer. 🔀 Also supports alternative model combinations (e.g., GLM + GPT, GLM + MiniMax) — no Claude API required.

💭 Why not self-play with a single model? Using Claude Code subagents or agent teams for both execution and review is technically possible, but tends to fall into local minima — the same model reviewing its own patterns creates blind spots.

Think of it like adversarial vs. stochastic bandits: a single model self-reviewing is the stochastic case (predictable reward noise), while cross-model review is adversarial (the reviewer actively probes weaknesses the executor didn't anticipate) — and adversarial bandits are fundamentally harder to game.

💭 Why two models, not more? Two is the minimum needed to break self-play blind spots, and 2-player games converge to Nash equilibrium far more efficiently than n-player ones. Adding more reviewers increases API cost and coordination overhead with diminishing returns — the biggest gain is going from 1→2, not 2→4.

Claude Code's strength is fast, fluid execution; Codex (GPT-5.4 xhigh) is slower but more deliberate and rigorous in critique. These complementary styles — speed × rigor — produce better outcomes than either model talking to itself.

📢 What's New

2026-03-14 — 📱 Feishu/Lark integration: three modes (off/push/interactive), mobile notifications for experiments, reviews, and checkpoints
2026-03-13 — 🛑 Human-in-the-loop: configurable AUTO_PROCEED checkpoints across all workflows. Full autopilot or step-by-step approval
2026-03-12 — 🔗 Zotero + Obsidian + local PDFs + arXiv/Scholar: multi-source literature search with cross-model novelty verification
2026-03-11 — 🚀 Three end-to-end workflows complete: one prompt → top-venue-style paper. /research-pipeline chains idea discovery → auto review → paper writing autonomously
2026-03-09 — 📝 /paper-writing workflow: narrative report → structured outline → figures → LaTeX → compiled PDF → 2-round auto-improvement (4/10 → 8.5/10)

🚀 Quick Start

# 1. Install skills
git clone https://github.com/wanshuiyin/Auto-claude-code-research-in-sleep.git
cp -r Auto-claude-code-research-in-sleep/skills/* ~/.claude/skills/

# 2. Set up Codex MCP (for review skills)
npm install -g @openai/codex
claude mcp add codex -s user -- codex mcp-server

# 3. Use in Claude Code
claude
> /idea-discovery "your research direction"  # Workflow 1: literature → brainstorm → validate
> /auto-review-loop                          # Workflow 2: review → fix → re-review overnight
> /paper-writing "NARRATIVE_REPORT.md"       # Workflow 3: narrative → polished PDF
> /research-pipeline "your research direction"  # Full pipeline: Workflow 1 → 2 → 3 end-to-end

Tip: Workflows pause at checkpoints for your approval by default. Add AUTO_PROCEED=true to run fully autonomously (great for overnight runs).

See full setup guide for details and alternative model combinations if you don't have Claude/OpenAI API.

✨ Features

📊 18 composable skills — mix and match, or chain into full pipelines (/idea-discovery, /auto-review-loop, /paper-writing, /research-pipeline)
🔍 Literature & novelty — multi-source paper search (Zotero + Obsidian + local PDFs + arXiv/Scholar) + cross-model novelty verification
💡 Idea discovery — literature survey → brainstorm 8-12 ideas → novelty check → GPU pilot experiments → ranked report
🔄 Auto review loop — 4-round autonomous review, 5/10 → 7.5/10 overnight with 20+ GPU experiments
📝 Paper writing — narrative → outline → figures → LaTeX → PDF → auto-review (4/10 → 8.5/10), one command
🤖 Cross-model collaboration — Claude Code executes, GPT-5.4 xhigh reviews. Adversarial, not self-play
📝 Peer review — review others' papers as a conference reviewer, with structured scoring and meta-review
🖥️ GPU deployment — auto rsync, screen sessions, multi-GPU parallel experiments, live monitoring
🔀 Flexible models — default Claude × GPT-5.4, also supports GLM + GPT, GLM + MiniMax — no Claude API required
🛑 Human-in-the-loop — configurable checkpoints at key decisions. AUTO_PROCEED=true for full autopilot, false to approve each step
📱 Feishu/Lark notifications — three modes: off (default, strongly recommended for most users), push-only (webhook, mobile alerts), interactive (approve/reject from Feishu). Zero impact when unconfigured

Preview: Push cards (group) & Interactive chat (private)

Push Only — group chat cards (experiment done, checkpoint, error, pipeline complete):

Interactive — private chat with Claude Code (approve/reject, custom instructions):
🧩 Extensible — domain-specific skills welcome! Add a SKILL.md and open a PR. See community skills like dse-loop (architecture/EDA)

📈 Score Progression (Real Run)

A real overnight 4-round run on an ML research project, from borderline reject to submission-ready:

Round	Score	What Happened
Initial	5.0/10	Borderline reject
Round 1	6.5/10	Added standard metrics, discovered metric decoupling
Round 2	6.8/10	Key claim failed to reproduce, pivoted narrative
Round 3	7.0/10	Large seed study killed main improvement claim
Round 4	7.5/10 ✅	Diagnostic evidence solidified, submission ready

The loop autonomously ran 20+ GPU experiments, rewrote the paper's narrative framing, and killed claims that didn't hold up — all without human intervention.

🔄 Workflows

These skills compose into a full research lifecycle. The three workflows can be used independently or chained together:

Exploring a new area (e.g., writing a survey)? Start with Workflow 1 → /idea-discovery
Already have an idea + initial plan? Jump straight to Workflow 2 → /auto-review-loop
Ready to write the paper? Workflow 3 → /paper-writing (or step by step: /paper-plan → /paper-figure → /paper-write → /paper-compile → /auto-paper-improvement-loop)
Full pipeline? Workflow 1 → Workflow 2 → Workflow 3 → /research-pipeline — from literature survey all the way to submission

⚠️ Important: These tools accelerate research, but they don't replace your own critical thinking. Always review generated ideas with your domain expertise, question the assumptions, and make the final call yourself. The best research comes from human insight + AI execution, not full autopilot.

Full Pipeline 🚀

/research-lit → /idea-creator → /novelty-check → implement → /run-experiment → /auto-review-loop → /paper-plan → /paper-figure → /paper-write → /auto-paper-improvement-loop → submit
  (survey)      (brainstorm)    (verify novel)    (code)      (deploy & run)    (review & fix)      (outline)     (plots)        (LaTeX+PDF)     (review ×2 + format)     (done!)
  ├──── Workflow 1: Idea Discovery ────┤              ├──── Workflow 2: Auto Loop ────┤   ├──────────────── Workflow 3: Paper Writing ──────────────────┤

📝 Blog post: 梦中科研全流程开源

Workflow 1: Literature & Idea Discovery 🔍

"What's the state of the art? Where are the gaps?"

Don't have a concrete idea yet? Just give a research direction — /idea-creator handles the rest:

📚 Survey the landscape (recent papers, open problems, recurring limitations)
🧠 Brainstorm 8-12 concrete ideas via GPT-5.4 xhigh
🔍 Filter by feasibility, compute cost, and quick novelty search
🛡️ Validate top ideas with deep novelty check + devil's advocate review
🧪 Pilot top 2-3 ideas in parallel on different GPUs (30 min - 2 hr each)
🏆 Rank by empirical signal — ideas with positive pilot results rise to the top

The output is a ranked IDEA_REPORT.md with hypotheses, pilot results, reviewer objections, and a suggested execution order. Ideas that fail are documented too, saving future dead-end exploration.

┌─────────────────────────────────────────────────────────────┐
│                  Idea Discovery                              │
│                                                              │
│   /research-lit     /idea-creator     /novelty-check         │
│   (find papers)     (brainstorm)      (verify novelty)       │
│         │                │                  │                │
│         ▼                ▼                  ▼                │
│   ┌──────────┐     ┌──────────┐       ┌──────────┐         │
│   │ Scan     │────▶│ Generate │──────▶│ Check if │         │
│   │ local    │     │ 8-12     │       │ idea is  │         │
│   │ papers + │     │ ideas    │       │ novel    │         │
│   │ search   │     │ + rank   │       │          │         │
│   └──────────┘     └──────────┘       └──────────┘         │
│                          │                  │                │
│                          ▼                  ▼                │
│                    ┌──────────┐       ┌──────────┐         │
│                    │ Filter   │──────▶│ External │         │
│                    │ by cost, │       │ LLM      │         │
│                    │ novelty  │       │ evaluates│         │
│                    └──────────┘       └──────────┘         │
│                                                              │
│   Typical flow:                                              │
│   1. /research-lit "discrete diffusion models"  (local → online) │
│   2. /idea-creator "DLLMs post training"               │
│   3. Review ranked ideas, pick top 2-3                       │
│   4. /novelty-check "top idea" (deep verification)           │
│   5. /research-review "top idea" (critical feedback)         │
│   6. Implement → /run-experiment → /auto-review-loop         │
└─────────────────────────────────────────────────────────────┘

Skills involved: research-lit + idea-creator + novelty-check + research-review

💡 One-command shortcut: /idea-discovery "your research direction" runs this entire workflow automatically.

🔄 Human-in-the-loop: Each phase presents results and waits for your feedback. Not happy? Tell it what's missing — it refines the prompt and regenerates. Trust the defaults? It auto-proceeds with the top-ranked option. You decide how hands-on to be.

⚙️ Pilot experiment budgets (max hours, timeout, GPU budget) are configurable — see Customization.

📝 Blog post: Claude Code 两月 NeurIPS 指北

Workflow 2: Auto Research Loop 🔁 (sleep & wake up to results)

"Review my paper, fix what's wrong, repeat until it's good."

┌─────────────────────────────────────────────────────────────┐
│                    Auto Review Loop                          │
│                                                              │
│   /research-review          /auto-review-loop                │
│   (single deep review)      (autonomous loop)                │
│         │                         │                          │
│         ▼                         ▼                          │
│   ┌──────────┐   ┌──────────┐   ┌──────────┐               │
│   │ External  │──▶│ Implement│──▶│ Monitor  │──▶ repeat     │
│   │ LLM      │   │ fixes    │   │ results  │    until       │
│   │ reviews  │   │ & run    │   │          │    score ≥ 6   │
│   └──────────┘   │ experiments│  └──────────┘               │
│                   └──────────┘                               │
│                                                              │
│   When reviewer suggests a new method direction:             │
│   /novelty-check — verify idea isn't already published       │
│                                                              │
│   Supporting skills:                                         │
│   /run-experiment    — deploy to local/remote GPU            │
│   /analyze-results   — interpret experiment outputs          │
│   /monitor-experiment — check progress, collect results      │
└─────────────────────────────────────────────────────────────┘

Skills involved: auto-review-loop + research-review + novelty-check + run-experiment + analyze-results + monitor-experiment

💡 One-command shortcut: /auto-review-loop "your paper topic" runs this entire workflow automatically.

🛡️ Key safety features:

🔒 MAX_ROUNDS = 4 — prevents infinite loops; stops early if score threshold is met
⏱️ > 4 GPU-hour experiments skipped — won't launch massive jobs; flags them for manual follow-up
🧠 Prefer reframing over new experiments — when both can address a weakness, chooses the cheaper path
🪞 No hiding weaknesses — explicit rule: "Do NOT hide weaknesses to game a positive score"
🔧 Fix before re-review — must actually implement fixes before resubmitting; no empty promises
💾 Compact recovery — persists state (REVIEW_STATE.json) after each round. If the context window fills up and auto-compacts mid-loop, the workflow reads the state file and resumes from where it left off — no human intervention needed

⚙️ MAX_ROUNDS, score threshold, and GPU limits are configurable — see Customization.

📝 Blog post: 开源 | 睡觉 Claude 自动跑实验改文

Workflow 3: Paper Writing Pipeline 📝

"Turn my research narrative into a submission-ready PDF." Requires a local LaTeX environment — see Prerequisites.

┌─────────────────────────────────────────────────────────────┐
│                   Paper Writing Pipeline                      │
│                                                               │
│   /paper-plan      /paper-figure     /paper-write             │
│   (outline)        (plots & tables)  (LaTeX draft)            │
│        │                │                 │                   │
│        ▼                ▼                 ▼                   │
│   ┌──────────┐    ┌──────────┐     ┌──────────┐              │
│   │ Claims-  │───▶│ Generate │────▶│ Section  │──┐           │
│   │ Evidence │    │ figures, │     │ by       │  │           │
│   │ Matrix + │    │ tables,  │     │ section  │  │           │
│   │ Section  │    │ LaTeX    │     │ LaTeX    │  │           │
│   │ Plan     │    │ includes │     │ draft    │  │           │
│   └──────────┘    └──────────┘     └──────────┘  │           │
│        │                                          │           │
│        │         /paper-compile                   │           │
│        │         (build PDF)                      │           │
│        │              │                           │           │
│        ▼              ▼                           ▼           │
│   ┌──────────────────────────────────────────────────┐       │
│   │ NARRATIVE_REPORT.md ──► PAPER_PLAN.md ──► paper/ │       │
│   │    (input)             (outline)      (LaTeX+PDF)│       │
│   └──────────────────────────────────────────────────┘       │
│                                                               │
│   Typical flow:                                               │
│   1. Write NARRATIVE_REPORT.md (from Workflow 2 results)      │
│   2. /paper-plan (claims-evidence matrix + section plan)      │
│   3. /paper-figure (comparison tables, training curves, etc.) │
│   4. /paper-write (section-by-section LaTeX generation)       │
│   5. /paper-compile (build PDF, fix errors, page check)       │
│   6. /auto-paper-improvement-loop (review ×2 + format check)  │
└─────────────────────────────────────────────────────────────┘

Skills involved: paper-plan + paper-figure + paper-write + paper-compile + auto-paper-improvement-loop

One-command shortcut: /paper-writing "NARRATIVE_REPORT.md" runs this entire workflow automatically.

Input: A NARRATIVE_REPORT.md describing the research: claims, experiments, results, figures. The more detailed the narrative (especially figure descriptions and quantitative results), the better the output.

Output: A submission-ready paper/ directory with LaTeX source, clean .bib (only cited entries), and compiled PDF.

Key features:

📐 Claims-Evidence Matrix — every claim maps to evidence, every experiment supports a claim
📊 Auto figure generation — line plots, bar charts, comparison tables from JSON data
🧹 Clean bib — automated filtering removes uncited entries (948→215 lines in testing)
📄 Flexible sections — 5-8 sections depending on paper type (theory papers often need 7)
🔍 GPT-5.4 review — each step optionally reviewed by external LLM
✂️ De-AI polish — removes AI writing patterns (delve, pivotal, landscape...)
🎯 Page verification — pdftotext-based precise check that main body fits page limit

⚠️ What /paper-figure can and cannot do: It auto-generates data-driven plots (training curves, bar charts, heatmaps) and comparison tables (LaTeX) from JSON/CSV data. It cannot generate architecture diagrams, pipeline figures, model diagrams, or grids of generated images — these must be created manually (e.g., draw.io, Figma, TikZ) and placed in figures/ before running /paper-write. In a typical ML paper, ~60% of figures are auto-generated and ~40% are manual.

Tested end-to-end: Generated a 9-page ICLR 2026 theory paper (7 sections, 29 citations, 4 figures, 2 comparison tables) from a single NARRATIVE_REPORT.md — zero compilation errors, zero undefined references.

Auto Paper Improvement Loop ✨

After Workflow 3 generates the paper, /auto-paper-improvement-loop runs 2 rounds of GPT-5.4 xhigh content review → fix → recompile, plus a final format compliance check, autonomously polishing the paper from rough draft to submission-ready.

Score Progression (Real Test — ICLR 2026 theory paper):

Round	Score	Key Changes
Round 0	4/10 (content)	Baseline
Round 1	6/10 (content)	Fixed assumptions, softened claims, renamed notation
Round 2	7/10 (content)	Added synthetic validation, stronger limitations
Round 3	5→8.5/10 (format)	Removed hero fig, appendix, compressed conclusion, float spacing

Final: 8 pages main body (ICLR limit: 9), 0 overfull hbox, ICLR-compliant. +4.5 points across 3 rounds.

Round 1 fixes (6 items)

CRITICAL — Assumption-model mismatch: A boundedness assumption contradicted the model's distributional family. Replaced with a tail-compatible assumption and added formal truncation bridge.
CRITICAL — Theory-practice gap: Theory assumes idealized encoders, experiments use learned nonlinear encoders. Softened "validate" → "demonstrate practical relevance" and added explicit disclaimer.
MAJOR — Missing quantitative metrics: Added parameter count table (latent vs total) with honest accounting of system cost.
MAJOR — Theorem not self-contained: Added "Interpretation" paragraph listing all dependencies explicitly.
MAJOR — Overclaim in novelty statement: Scoped a broad "first convergence guarantee" to precise conditions under which it holds.
MAJOR — Notation confusion: Renamed a symbol that clashed with another key variable. Added Notation paragraph.

Round 2 fixes (4 items)

MAJOR — Missing theory-aligned experiments: Added a synthetic validation subsection directly testing the two main theoretical predictions under controlled conditions.
MAJOR — Overclaim softening: Replaced strong equivalence claims with appropriately hedged language across all files.
MAJOR — Informal theoretical argument: Formalized an informal justification into a proper proposition with explicit error bounds.
MINOR — Weak limitations: Expanded to explicitly list all assumptions and acknowledge missing standard evaluations.

Round 3 format fixes (8 items)

Removed hero figure block (saved ~0.7 pages)
Compressed conclusion from 15→9 lines
Moved synthetic validation to Appendix A
Moved comparison tables to Appendix B
Fixed overfull hbox (85pt) with \resizebox
Added compact float spacing (\captionsetup, \textfloatsep)
Inlined centered question block in introduction
Tightened itemize environments

🧰 All Skills

Skill	Description	Needs Codex MCP?
💡 `idea-creator`	Generate and rank research ideas given a broad direction (brainstorm + filter + validate)	Yes
🔬 `research-review`	Single-round deep review from external LLM (xhigh reasoning)	Yes
🔁 `auto-review-loop`	Autonomous multi-round review→fix→re-review loop (max 4 rounds)	Yes
📚 `research-lit`	Scan Zotero + Obsidian + local PDFs + web search, analyze related work, find gaps	No (Optional: Zotero/Obsidian MCP)
📊 `analyze-results`	Analyze experiment results, compute statistics, generate insights	No
👀 `monitor-experiment`	Monitor running experiments, check progress, collect results	No
🔍 `novelty-check`	Verify research idea novelty against recent literature before implementing	Yes
🚀 `run-experiment`	Deploy experiments to local (MPS/CUDA) or remote GPU servers	No
🎨 `pixel-art`	Generate pixel art SVG illustrations for READMEs, docs, or slides	No
🔭 `idea-discovery`	Workflow 1 pipeline: research-lit → idea-creator → novelty-check → research-review	Yes
🏗️ `research-pipeline`	Full pipeline: Workflow 1 → implement → Workflow 2 → Workflow 3, from direction to submission	Yes
📐 `paper-plan`	Generate paper outline with claims-evidence matrix, figure plan, and citation scaffolding	Yes
📊 `paper-figure`	Publication-quality matplotlib/seaborn plots from experiment data, with LaTeX snippets	Optional
✍️ `paper-write`	Section-by-section LaTeX generation with ICLR/NeurIPS/ICML templates	Yes
🔨 `paper-compile`	Compile LaTeX to PDF, auto-fix errors, submission readiness checks	No
🔄 `auto-paper-improvement-loop`	2-round content review + format check loop on generated paper (4/10 → 8.5/10)	Yes
📝 `paper-writing`	Workflow 3 pipeline: paper-plan → paper-figure → paper-write → paper-compile → auto-paper-improvement-loop	Yes
📱 `feishu-notify`	Feishu/Lark notifications — push (webhook) or interactive (bidirectional). Off by default	No

Community Skills	Domain-specific skills contributed by the community
🏗️ `dse-loop`	Autonomous design space exploration — iteratively run, analyze, and tune parameters. Built for architecture/EDA (gem5, Yosys), but works for any domain with tunable parameters (comp chem, CFD, bioinformatics, etc.)	No

⚙️ Setup

Prerequisites

Claude Code installed

(For review skills) Codex CLI installed and configured as MCP server:

npm install -g @openai/codex
claude mcp add codex -s user -- codex mcp-server

(For Workflow 3: paper writing) LaTeX environment with latexmk and pdfinfo:

# macOS
brew install --cask mactex    # or: brew install basictex
brew install poppler          # provides pdfinfo

# Ubuntu/Debian
sudo apt install texlive-full latexmk poppler-utils

# Verify
latexmk --version && pdfinfo -v

If you only need Workflow 1 & 2 (idea discovery + auto review), LaTeX is not required.

Install Skills

git clone https://github.com/wanshuiyin/Auto-claude-code-research-in-sleep.git
cd Auto-claude-code-research-in-sleep

# Install all skills globally
cp -r skills/* ~/.claude/skills/

# Or install specific skills
cp -r skills/auto-review-loop ~/.claude/skills/
cp -r skills/research-lit ~/.claude/skills/

Usage

# Workflow 1: Idea Discovery
> /idea-discovery "your research direction"          # full pipeline
> /research-lit "topic"                              # just literature survey (all sources)
> /research-lit "topic" — sources: zotero, web        # mix and match sources
> /idea-creator "topic"                              # just brainstorm

# Workflow 2: Auto Research Loop
> /auto-review-loop "your paper topic"               # review → fix → repeat
> /research-review "your paper"                      # single deep review

# Workflow 3: Paper Writing
> /paper-writing "NARRATIVE_REPORT.md"               # full pipeline
> /paper-plan "NARRATIVE_REPORT.md"                  # just outline
> /paper-compile "paper/"                            # just compile

# Full Pipeline
> /research-pipeline "your research direction"       # Workflow 1 → 2 → 3 end-to-end

# Supporting Skills
> /run-experiment train.py --lr 1e-4 --epochs 100
> /analyze-results figures/*.json
> /monitor-experiment server5

🌙 Auto-Allow for Overnight Runs (Optional)

To run the auto-review loop without clicking permission prompts, add to .claude/settings.local.json:

{
  "permissions": {
    "allow": [
      "mcp__codex__codex",
      "mcp__codex__codex-reply",
      "Write",
      "Edit",
      "Skill(auto-review-loop)"
    ]
  }
}

🖥️ GPU Server Setup (For Auto-Experiments)

When GPT-5.4 says "run an ablation study" or "add a baseline comparison", Claude Code automatically writes the experiment script and deploys it to your GPU server. For this to work, Claude Code needs to know your server environment.

Add your server info to your project's CLAUDE.md:

## Remote Server

- SSH: `ssh my-gpu-server` (key-based auth, no password)
- GPU: 4x A100
- Conda env: `research` (Python 3.10 + PyTorch)
- Activate: `eval "$(/opt/conda/bin/conda shell.bash hook)" && conda activate research`
- Code directory: `/home/user/experiments/`
- Use `screen` for background jobs: `screen -dmS exp0 bash -c '...'`

Claude Code reads this and knows how to SSH in, activate the environment, and launch experiments. GPT-5.4 (the reviewer) only decides what experiments to run — Claude Code figures out how based on your CLAUDE.md.

No server? The review and rewriting skills still work without GPU access. Only experiment-related fixes will be skipped (flagged for manual follow-up).

📚 Zotero Integration (Optional)

If you use Zotero to manage your paper library, /research-lit can search your collections, read your annotations/highlights, and export BibTeX — all before searching the web.

Recommended: zotero-mcp (1.8k⭐, semantic search, PDF annotations, BibTeX export)

# Install
uv tool install zotero-mcp-server   # or: pip install zotero-mcp-server

# Add to Claude Code (Local API — requires Zotero desktop running)
claude mcp add zotero -s user -- zotero-mcp -e ZOTERO_LOCAL=true

# Or use Web API (works without Zotero running)
claude mcp add zotero -s user -- zotero-mcp \
  -e ZOTERO_API_KEY=your_key -e ZOTERO_USER_ID=your_id

Get your API key at https://www.zotero.org/settings/keys

What it enables in /research-lit:

🔍 Search your Zotero library by topic (including semantic/vector search)
📂 Browse collections and tags
📝 Read your PDF annotations and highlights (what you personally found important)
📄 Export BibTeX for direct use in paper writing

Not using Zotero? No problem — /research-lit automatically skips Zotero and uses local PDFs + web search instead.

📓 Obsidian Integration (Optional)

If you use Obsidian for research notes, /research-lit can search your vault for paper summaries, tagged references, and your own insights.

Recommended: mcpvault (760⭐, no Obsidian app needed, 14 tools, BM25 search)

# Add to Claude Code (point to your vault path)
claude mcp add obsidian-vault -s user -- npx @bitbonsai/mcpvault@latest /path/to/your/vault

Optional complement: obsidian-skills (13.6k⭐, by Obsidian CEO) — teaches Claude to understand Obsidian-specific Markdown (wikilinks, callouts, properties). Copy to your vault:

git clone https://github.com/kepano/obsidian-skills.git
cp -r obsidian-skills/.claude /path/to/your/vault/

What it enables in /research-lit:

🔍 Search your vault for notes on the research topic
🏷️ Find notes by tags (e.g., #paper-review, #diffusion-models)
📝 Read your processed summaries and insights (more valuable than raw papers)
🔗 Follow wikilinks to discover related notes

Not using Obsidian? No problem — /research-lit automatically skips Obsidian and works as before.

💡 Zotero + Obsidian together: Many researchers use Zotero for paper storage and Obsidian for notes. Both integrations work simultaneously — /research-lit checks Zotero first (raw papers + annotations), then Obsidian (your processed notes), then local PDFs, then web search.

📱 Feishu/Lark Integration (Optional)

Get mobile notifications when experiments finish, reviews score, or checkpoints need your input — without sitting in front of the terminal.

Push Only (group cards)	Interactive (private chat)

Three modes — you choose per-project:

Mode	What happens	You need
Off (default)	Nothing. Pure CLI, no Feishu	Nothing
Push only	Webhook notifications at key events. Mobile push, no reply	Feishu bot webhook URL
Interactive	Full bidirectional. Approve/reject ideas, reply to checkpoints from Feishu	feishu-claude-code running

Push Only Setup (5 min)

Group notifications with rich cards — experiment done, review scored, pipeline complete. Mobile push, no reply needed.

Step 1: Create a Feishu group bot

Open your Feishu group (or create a test group)
Group Settings → Bots → Add Bot → Custom Bot
Name it (e.g., ARIS Notifications), copy the Webhook URL
Security: add custom keyword ARIS (all notifications include this word), or leave unrestricted

Step 2: Create config file

cat > ~/.claude/feishu.json << 'EOF'
{
  "mode": "push",
  "webhook_url": "https://open.feishu.cn/open-apis/bot/v2/hook/YOUR_WEBHOOK_ID"
}
EOF

Step 3: Test it

curl -s -X POST "YOUR_WEBHOOK_URL" \
  -H "Content-Type: application/json" \
  -d '{
    "msg_type": "interactive",
    "card": {
      "header": {"title": {"tag": "plain_text", "content": "🧪 ARIS Test"}, "template": "blue"},
      "elements": [{"tag": "markdown", "content": "Push mode working! 🎉"}]
    }
  }'

You should see a blue card in your group. Skills will now automatically send rich cards at key events:

Event	Card color	Content
Review scored ≥ 6	🟢 Green	Score, verdict, top weaknesses
Review scored < 6	🟠 Orange	Score, verdict, action items
Experiment complete	🟢 Green	Results table, delta vs baseline
Checkpoint waiting	🟡 Yellow	Question, options, context
Error	🔴 Red	Error message, suggested fix
Pipeline done	🟣 Purple	Score progression, deliverables

Interactive Setup (15 min)

Everything Push mode does, plus bidirectional private chat with Claude Code via Feishu. Approve/reject ideas, reply to checkpoints, give custom instructions — all from your phone.

How it works: Push cards go to the group (everyone sees status). Interactive conversations happen in private chat with the bot (you reply, Claude Code acts on it).

Step 1: Complete Push setup above first (you'll keep both)

Step 2: Create a Feishu app on open.feishu.cn

Click Create Enterprise App → name it (e.g., ARIS Claude Bot) → create
Left menu → Add Capabilities → check Bot
Left menu → Permissions → search and enable these 5 permissions:

Permission	Scope	Why
`im:message`	Send & receive messages	Core messaging
`im:message:send_as_bot`	Send as bot	Bot replies
`im:message.group_at_msg:readonly`	Receive group @mentions	Group messages
`im:message.p2p_msg:readonly`	Receive private messages	⚠️ Easy to miss! Without this, the bot connects but never receives your messages
`im:resource`	Access attachments	Images/files

Left menu → Events & Callbacks → select Long Connection mode → add event: im.message.receive_v1 → save

⚠️ Important: The "Long Connection" page may show "未检测到应用连接信息" — this is normal. You need to start the bridge first (Step 3), then come back and save.

Left menu → Version Management → Create Version → fill description → Submit for Review

For personal/test Feishu organizations, approval is usually instant.

Step 3: Deploy the bridge

git clone https://github.com/joewongjc/feishu-claude-code.git
cd feishu-claude-code
python3 -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt

# Configure
cp .env.example .env

Edit .env:

FEISHU_APP_ID=cli_your_app_id          # From app credentials page
FEISHU_APP_SECRET=your_app_secret      # From app credentials page
DEFAULT_MODEL=claude-opus-4-6          # ⚠️ Default is sonnet — change to opus for best results
DEFAULT_CWD=/path/to/your/project      # Working directory for Claude Code
PERMISSION_MODE=bypassPermissions      # Or "default" for safer mode

⚠️ Model matters: The default claude-sonnet-4-6 works but may struggle with complex project context. claude-opus-4-6 correctly identified 18 ARIS skills on first try where sonnet could not.

Start the bridge:

python main.py
# Expected output:
# ✅ 连接飞书 WebSocket 长连接（自动重连）...
# [Lark] connected to wss://msg-frontier.feishu.cn/ws/v2?...

For long-running use, put it in a screen session:

screen -dmS feishu-bridge bash -c 'cd /path/to/feishu-claude-code && source .venv/bin/activate && python main.py'

Step 4: Save event config — Go back to Feishu Open Platform → Events & Callbacks → the long connection should now show "已检测到连接" → Save

If you published the app version before the bridge was running, you may need to create a new version (e.g., 1.0.1) and re-publish after saving event config.

Step 5: Test private chat

In Feishu, find the bot in your contacts (search by app name)
Send it a message: 你好
It should reply via Claude Code

If the bot doesn't reply: Send /new to reset the session, then try again. Common issues:

Symptom	Cause	Fix
Bot connects but never receives messages	Missing `im:message.p2p_msg:readonly` permission	Add permission → create new version → publish
Bot replies but doesn't know your project	`DEFAULT_CWD` points to wrong directory	Edit `.env` → restart bridge
Bot replies but seems less capable	Using `claude-sonnet-4-6`	Change to `claude-opus-4-6` in `.env` → restart
Old session has stale context	Session cached from before config change	Send `/new` in chat to start fresh session
"未检测到应用连接信息" when saving events	Bridge not running yet	Start bridge first, then save event config

Step 6: Update ARIS config

cat > ~/.claude/feishu.json << 'EOF'
{
  "mode": "interactive",
  "webhook_url": "https://open.feishu.cn/open-apis/bot/v2/hook/YOUR_WEBHOOK_ID",
  "interactive": {
    "bridge_url": "http://localhost:5000",
    "timeout_seconds": 300
  }
}
EOF

Now skills will:

Push rich cards to the group (status notifications, everyone sees)
Private chat you for decisions (checkpoints, approve/reject, custom instructions)

Which skills send notifications?

Skill	Events	Push	Interactive
`/auto-review-loop`	Review scored (each round), loop complete	Score + verdict	+ wait for continue/stop
`/auto-paper-improvement-loop`	Review scored, all rounds done	Score progression	Score progression
`/run-experiment`	Experiments deployed	GPU assignment + ETA	GPU assignment + ETA
`/monitor-experiment`	Results collected	Results table	Results table
`/idea-discovery`	Phase transitions, final report	Summary at each phase	+ approve/reject at checkpoints
`/research-pipeline`	Stage transitions, pipeline done	Stage summary	+ approve/reject

Not using Feishu? No problem — without ~/.claude/feishu.json, all skills behave exactly as before. Zero overhead, zero side effects.

💡 Alternative IM platforms: The push-only webhook pattern works with any service that accepts incoming webhooks (Slack, Discord, DingTalk, WeChat Work). Just change the webhook_url and card format in feishu-notify/SKILL.md. For bidirectional support, see cc-connect (multi-platform bridge) or clawdbot-feishu.

🎛️ Customization

Skills are plain Markdown files. Fork and customize:

Auto Review Loop (`auto-review-loop`)

Constant	Default	Description
`MAX_ROUNDS`	4	Maximum review→fix→re-review iterations
`POSITIVE_THRESHOLD`	6/10	Score at which the loop stops (submission-ready)
`> 4 GPU-hour skip`	4h	Experiments exceeding this are flagged for manual follow-up

Idea Discovery (`idea-discovery` / `idea-creator`)

Constant	Default	Description
`PILOT_MAX_HOURS`	2h	Skip any pilot estimated to take longer per GPU
`PILOT_TIMEOUT_HOURS`	3h	Hard timeout — kill runaway pilots, collect partial results
`MAX_PILOT_IDEAS`	3	Maximum number of ideas to pilot in parallel
`MAX_TOTAL_GPU_HOURS`	8h	Total GPU budget across all pilots
`AUTO_PROCEED`	true	Auto-continue with top-ranked option if user doesn't respond. Set `false` to always wait for explicit approval

Override inline: /idea-discovery "topic" — pilot budget: 4h per idea, wait for my approval at each step

Literature Search (`research-lit`)

Constant	Default	Description
`PAPER_LIBRARY`	`papers/`, `literature/`	Local directories to scan for PDFs before searching online
`MAX_LOCAL_PAPERS`	20	Max local PDFs to scan (first 3 pages each)

Override inline: /research-lit "topic" — paper library: ~/Zotero/storage/

General (all skills using Codex MCP)

Constant	Default	Description
`REVIEWER_MODEL`	`gpt-5.4`	OpenAI model used via Codex MCP. Options: `gpt-5.4`, `o3`, `gpt-4o`, etc.

Prompt templates — tailor the review persona and evaluation criteria
allowed-tools — restrict or expand what each skill can do

🔀 Alternative Model Combinations

Don't have Claude / OpenAI API access? You can swap in other models — same cross-model architecture, different providers.

Role	Default	Alt A: GLM + GPT	Alt B: GLM + MiniMax
Executor (Claude Code)	Claude Opus/Sonnet	GLM-5 (ZhiPu API)	GLM-5 (ZhiPu API)
Reviewer (Codex MCP)	GPT-5.4	GPT-5.4 (OpenAI API)	MiniMax-M2.5 (MiniMax API)
Need OpenAI API?	Yes	Yes	No

Step 1: Install Claude Code & Codex CLI

npm install -g @anthropic-ai/claude-code
npm install -g @openai/codex

Step 2: Configure `~/.claude/settings.json`

Open with: nano ~/.claude/settings.json

Alt A: GLM (executor) + GPT (reviewer) — Only replace Claude, keep GPT-5.4 as reviewer

{
    "env": {
        "ANTHROPIC_AUTH_TOKEN": "your_zai_api_key",
        "ANTHROPIC_BASE_URL": "https://api.z.ai/api/anthropic",
        "API_TIMEOUT_MS": "3000000",
        "ANTHROPIC_DEFAULT_HAIKU_MODEL": "glm-4.5-air",
        "ANTHROPIC_DEFAULT_SONNET_MODEL": "glm-4.7",
        "ANTHROPIC_DEFAULT_OPUS_MODEL": "glm-5"
    },
    "mcpServers": {
        "codex": {
            "command": "/opt/homebrew/bin/codex",
            "args": [
                "mcp-server"
            ]
        }
    }
}

Codex CLI uses your existing OPENAI_API_KEY (from ~/.codex/config.toml or environment) — no extra config needed for the reviewer side.

Alt B: GLM (executor) + MiniMax (reviewer) — No Claude or OpenAI API needed

{
    "env": {
        "ANTHROPIC_AUTH_TOKEN": "your_zai_api_key",
        "ANTHROPIC_BASE_URL": "https://api.z.ai/api/anthropic",
        "API_TIMEOUT_MS": "3000000",
        "ANTHROPIC_DEFAULT_HAIKU_MODEL": "glm-4.5-air",
        "ANTHROPIC_DEFAULT_SONNET_MODEL": "glm-4.7",
        "ANTHROPIC_DEFAULT_OPUS_MODEL": "glm-5",
        "CODEX_API_KEY": "your_minimax_api_key",
        "CODEX_API_BASE": "https://api.minimax.chat/v1/",
        "CODEX_MODEL": "MiniMax-M2.5"
    },
    "mcpServers": {
        "codex": {
            "command": "/opt/homebrew/bin/codex",
            "args": [
                "mcp-server"
            ]
        }
    }
}

Save: Ctrl+O → Enter → Ctrl+X

Step 3: Install Skills & Run

git clone https://github.com/wanshuiyin/Auto-claude-code-research-in-sleep.git
cd Auto-claude-code-research-in-sleep
cp -r skills/* ~/.claude/skills/

# Launch Claude Code (now powered by GLM)
claude

Step 4: Let GLM Read the Project ⚠️ IMPORTANT

🔴 Do NOT skip this step. GLM's prompt handling differs from Claude's. You must let GLM read through the project once to ensure skills are correctly parsed.

After launching claude, run in the conversation:

Read through this project and verify all skills are working:
/idea-creator, /research-review, /auto-review-loop, /novelty-check,
/idea-discovery, /research-pipeline, /research-lit, /run-experiment,
/analyze-results, /monitor-experiment, /pixel-art

For each skill, confirm: (1) it loads without errors, (2) the frontmatter is parsed correctly.

This lets GLM (acting as Claude Code) familiarize itself with the skill files and catch any compatibility issues upfront — rather than discovering them mid-workflow when it's expensive to fail.

⚠️ Note: Alternative models may behave differently from Claude and GPT-5.4. You may need to adjust REVIEWER_MODEL in the skills and tune prompt templates for best results. The core cross-model architecture remains the same.

📋 Roadmap

Done

Human-in-the-loop checkpoints — idea-discovery and research-pipeline pause at key decision points for user approval. Configurable via AUTO_PROCEED (default: auto-continue; set false to always wait)
Alternative model combinations — GLM + GPT, GLM + MiniMax fully documented with setup guides. No Claude or OpenAI API required
Workflow 3: Paper Writing Pipeline — full chain: /paper-plan → /paper-figure → /paper-write → /paper-compile. ICLR/NeurIPS/ICML templates, claims-evidence matrix, publication-quality figures, latexmk auto-fix. Inspired by claude-scholar, Research-Paper-Writing-Skills, baoyu-skills

Show 6 more completed items

Configurable REVIEWER_MODEL — all Codex-dependent skills support custom reviewer model (default gpt-5.4, also works with o3, gpt-4o, etc.)
Local paper library scanning — /research-lit scans local papers/ and literature/ directories before external search, leveraging papers you've already read
Idea Discovery pipeline — /idea-discovery orchestrates research-lit → idea-creator → novelty-check → research-review in one command, with pilot experiments on GPU
Full research pipeline — /research-pipeline chains Workflow 1 (idea discovery) → implementation → Workflow 2 (auto-review-loop) end-to-end
Peer review skill — /peer-review for reviewing others' papers as a conference reviewer, with GPT-5.4 meta-review
Cross-model collaboration — Claude Code (executor) × Codex GPT-5.4 xhigh (reviewer) architecture, avoiding single-model self-play local minima

Planned

Feishu/Lark integration — three modes (off/push/interactive), configurable via ~/.claude/feishu.json. Push-only needs just a webhook URL; interactive uses feishu-claude-code. Off by default — zero impact on existing workflows. See setup guide
W&B integration — pull training curves and metrics from Weights & Biases as feedback signal. Auto-review-loop can read loss/accuracy plots to diagnose training issues and suggest next experiments
- Related projects: wandb-mcp-server (official W&B MCP, if available), or via wandb api CLI
Zotero MCP integration — /research-lit searches Zotero collections, reads annotations/highlights, exports BibTeX. Recommended: zotero-mcp (1.8k⭐). See setup guide
Obsidian integration — /research-lit searches Obsidian vault for research notes, tagged references, wikilinks. Recommended: mcpvault (760⭐) + obsidian-skills (13.6k⭐). See setup guide
More executor × reviewer combinations (Gemini, DeepSeek, etc.)

💬 Community

Domain-specific skills welcome! The core skills cover general research workflows, but every field has its own tools and patterns. We welcome PRs that add new skills for your domain — EDA, bioinformatics, robotics, HPC, or anything else. Just add a skills/your-skill/SKILL.md and open a PR. See dse-loop for an example.

Join the WeChat group for discussion on Claude Code + AI-driven research workflows:

⭐ Star History

🙏 Acknowledgements

This project builds on and integrates with many excellent open-source projects:

Core Infrastructure

Claude Code — Anthropic's CLI for Claude, the execution backbone
Codex CLI — OpenAI's CLI, used as MCP server for cross-model review

Zotero Integration (setup guide)

zotero-mcp — Zotero MCP server with semantic search and PDF annotations
Zotero — Open-source reference manager

Obsidian Integration (setup guide)

mcpvault — Obsidian vault MCP server (no app required)
obsidian-skills — Claude Code skills for Obsidian Markdown by Steph Ango (Obsidian CEO)

Paper Writing Inspiration

claude-scholar — Academic paper writing with Claude
Research-Paper-Writing-Skills — Paper writing skill templates
baoyu-skills — Claude Code skills collection

Feishu/Lark Integration (setup guide)

feishu-claude-code — Bidirectional Feishu ↔ Claude Code bridge
clawdbot-feishu — Feishu bot for Claude
cc-connect — Multi-platform messaging bridge
lark-openapi-mcp — Official Lark MCP server

Community

awesome-agent-skills — Curated list of Claude Code skills (featured)

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 147 Commits
assets		assets
docs		docs
skills		skills
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
README_CN.md		README_CN.md
xhs_post.md		xhs_post.md

Folders and files

Latest commit

History

Repository files navigation

Auto-claude-code-research-in-sleep (ARIS ⚔️)

📢 What's New

🚀 Quick Start

✨ Features

📈 Score Progression (Real Run)

🔄 Workflows

Full Pipeline 🚀

Workflow 1: Literature & Idea Discovery 🔍

Workflow 2: Auto Research Loop 🔁 (sleep & wake up to results)

Workflow 3: Paper Writing Pipeline 📝

Auto Paper Improvement Loop ✨

🧰 All Skills

⚙️ Setup

Prerequisites

Install Skills

Usage

🌙 Auto-Allow for Overnight Runs (Optional)

🖥️ GPU Server Setup (For Auto-Experiments)

📚 Zotero Integration (Optional)

📓 Obsidian Integration (Optional)

📱 Feishu/Lark Integration (Optional)

Which skills send notifications?

🎛️ Customization

Auto Review Loop (auto-review-loop)

Idea Discovery (idea-discovery / idea-creator)

Literature Search (research-lit)

General (all skills using Codex MCP)

🔀 Alternative Model Combinations

Step 1: Install Claude Code & Codex CLI

Step 2: Configure ~/.claude/settings.json

Step 3: Install Skills & Run

Step 4: Let GLM Read the Project ⚠️ IMPORTANT

📋 Roadmap

Done

Planned

💬 Community

⭐ Star History

🙏 Acknowledgements

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 4

Languages

Auto Review Loop (`auto-review-loop`)

Idea Discovery (`idea-discovery` / `idea-creator`)

Literature Search (`research-lit`)

Step 2: Configure `~/.claude/settings.json`

Packages