Note: Scripts are now organized under
scripts/tools/andscripts/demos/. Legacy paths (e.g.,scripts/build_entity_v3.py) still work via thin wrappers.
Project Name: Idea2Paper - Automated Academic Paper Generation System Based on Knowledge Graph
Core Goal: Automatically transform a user's research Idea into a submission-ready paper Story (Narrative Skeleton) that meets top-tier conference (ICLR) standards.
Tech Stack:
- Knowledge Graph: NetworkX
- Vector Retrieval: Embedding (Qwen3-Embedding-4B)
- Large Language Models: Qwen3-14B, Qwen2.5-7B-Instruct
- Data Source: ICLR 2025 Paper Dataset (8,285 papers)
- Idea2Paper Project Summary Document
- 1. System Architecture
- 2. Knowledge Graph Construction
- 3. Three-Way Retrieval System
- 4. Idea2Story Pipeline
- 5. Configuration Overview
- 6. Complete Workflow
- 7. Core Innovations
- 8. System Advantages
- 9. Current Limitations & Future Directions
- 10. Documentation Index
- 11. Code Structure
- 12. Key Metrics
- 13. Usage Recommendations
- 14. Troubleshooting
- 15. Summary
- 16. Acknowledgements
┌─────────────────────────────────────────────────────────────────────────┐
│ Idea2Paper Complete Workflow │
└─────────────────────────────────────────────────────────────────────────┘
User Input Idea
│
├──────────────────────────────────────────────────────────────────────┐
│ Phase 1: Knowledge Graph Construction │
│ (One-time build, reusable subsequently) │
├──────────────────────────────────────────────────────────────────────┤
│ │
│ 1. Load ICLR Paper Data (8,285 papers) │
│ ↓ │
│ 2. Construct 4 Types of Nodes │
│ ├─ Idea Nodes (8,284) │
│ ├─ Pattern Nodes (124, LLM-Enhanced) │
│ ├─ Domain Nodes (98) │
│ └─ Paper Nodes (8,285) │
│ ↓ │
│ 3. Construct Edge Relations (444,872 edges) │
│ ├─ Basic Connection Edges (Paper→Idea/Pattern/Domain) │
│ └─ Retrieval Auxiliary Edges (Idea→Domain, Pattern→Domain) │
│ ↓ │
│ 4. Output Knowledge Graph │
│ │
└──────────────────────────────────────────────────────────────────────┘
│
├──────────────────────────────────────────────────────────────────────┐
│ Phase 2: Three-Way Retrieval │
│ (Per run, approx. 27s) │
├──────────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────┬─────────────┬─────────────┐ │
│ │ Path 1 │ Path 2 │ Path 3 │ │
│ │ Similar Idea│ Domain Rel. │Similar Paper│ │
│ │ (Weight 0.4)│ (Weight 0.2)│ (Weight 0.4)│ │
│ └─────────────┴─────────────┴─────────────┘ │
│ │ │ │ │
│ │ │ │ │
│ Coarse: Jaccard Match Domain Coarse: Jaccard │
│ Top-100 Top-5 Top-100 │
│ ↓ ↓ ↓ │
│ Fine: Embedding Find Pattern Fine: Embedding │
│ Top-10 works_well Top-20 │
│ ↓ ↓ ↓ │
│ Get Pattern Get Pattern Get Pattern │
│ Score Score Score │
│ │ │ │ │
│ └──────────────┴──────────────┘ │
│ ↓ │
│ Weighted Fusion & Fine Ranking │
│ ↓ │
│ Top-10 Patterns │
│ │
└──────────────────────────────────────────────────────────────────────┘
│
├──────────────────────────────────────────────────────────────────────┐
│ Phase 3: Story Generation & Refinemen │
│ (3-10 minutes) │
├──────────────────────────────────────────────────────────────────────┤
│ │
│ 1. Multi-dimensional Pattern Classification │
│ ├─ Stability │
│ ├─ Novelty │
│ └─ Cross-Domain │
│ ↓ │
│ 2. Select Initial Pattern → Generate Draft Story │
│ ↓ │
│ 3. Multi-Agent Critic Review (Methodology/Novelty/Storyteller) │
│ ↓ │
│ 4. Decision: Score >= 7.0? │
│ ├─[Yes]→ Proceed to Phase 4 │
│ └─[No] → Intelligent Refinement │
│ │ │
│ ├─ Novelty Stagnated? → [Novelty Mode] │
│ │ ├─ Traverse Novelty Patterns │
│ │ ├─ Idea Fusion │
│ │ ├─ Story Reflection (Quality Assessment) │
│ │ ├─ Regenerate Story │
│ │ ├─ Critic Review │
│ │ ├─ Score Dropped? → Rollback │
│ │ └─ Fallback: Select Highest Score Version │
│ │ │
│ └─ Ordinary Refinement → Inject Complementary Tricks │
│ ├─ Lacks Novelty → Tail Injection (Rank 5-10) │
│ ├─ Lacks Stability → Head Injection (Rank 1-3) │
│ └─ Return to Step 2 │
│ │
└──────────────────────────────────────────────────────────────────────┘
│
├──────────────────────────────────────────────────────────────────────┐
│ Phase 4: RAG Novelty Verification │
│ (Approx. 30s) │
├──────────────────────────────────────────────────────────────────────┤
│ │
│ 1. Extract Key Methods → Retrieve Papers from Top Confs (Last 3 Yrs)│
│ ↓ │
│ 2. Decision: Similarity > 0.75? │
│ ├─[No] → Output Final Story │
│ └─[Yes]→ Collision! Pivot Avoidance │
│ ├─ Analyze Collision Points │
│ ├─ Generate Constraints (Disable Tech/Domain Shift) │
│ └─ Return to Phase 3, Step 2 │
│ │
└──────────────────────────────────────────────────────────────────────┘
│
▼
Output Final Story (JSON format)
Workflow Explanation:
- Phase 1: Offline construction, run only once.
- Phase 2: Real-time retrieval, 13x speedup (27 seconds).
- Phase 3: Core generation, intelligent refinement mechanism.
- Phase 4: Deduplication/Novelty verification to avoid collision.
| Layer | Module | File/Script | Function |
|---|---|---|---|
| Data Layer | Knowledge Graph Construction | build_entity_v3.py, build_edges.py |
Construct nodes and edges |
| Retrieval Layer | Three-Way Retrieval System | recall_system.py |
Retrieve relevant Patterns |
| Generation Layer | Pattern Selection | pattern_selector.py |
Multi-dimensional Pattern classification |
| Generation Layer | Idea Fusion | planner.py |
Fuse innovative Ideas |
| Generation Layer | Story Generation | story_generator.py |
Generate Paper Story |
| Generation Layer | Story Reflection | story_reflector.py |
Assess fusion quality |
| Generation Layer | Critic Review | critic.py |
Multi-agent review |
| Generation Layer | Intelligent Refinement | refinement.py |
Iterative optimization |
| Generation Layer | RAG Verification | verifier.py |
Deduplication and avoidance |
| Orchestration Layer | Pipeline Management | manager.py, idea2story_pipeline.py |
Workflow orchestration |
Knowledge Graph Statistics:
├─ Total Nodes: 16,791
│ ├─ Idea: 8,284 (100% coverage)
│ ├─ Pattern: 124 (Generated via clustering)
│ ├─ Domain: 98 (Generated via aggregation)
│ └─ Paper: 8,285
└─ Total Edges: 444,872
├─ Basic Connection Edges: ~25,000
└─ Retrieval Auxiliary Edges: ~420,000
Idea Node: The core innovation of the paper
{
"idea_id": "idea_0",
"description": "Core idea description...",
"base_problem": "Base problem...",
"solution_pattern": "Solution pattern...",
"pattern_ids": ["pattern_9", ...]
}Pattern Node: Writing trope/Method unit template
{
"pattern_id": "pattern_24",
"name": "Reframing Graph Learning Scalability",
"size": 331,
"llm_enhanced_summary": {
"representative_ideas": "Inductive summary...",
"common_tricks": ["Trick 1", "Trick 2"]
}
}Domain Node: Research domain
{
"domain_id": "domain_0",
"name": "Natural Language Processing",
"paper_count": 1076,
"sub_domains": ["Text Classification", ...]
}Paper Node: Concrete paper
{
"paper_id": "RUzSobdYy0V",
"title": "Quantifying and Mitigating...",
"domain": "Fairness & Accountability",
"idea": "Core idea...",
"pattern_id": "pattern_9"
}Basic Connection Edges:
Paper → Idea(implements): The paper implements this Idea.Paper → Pattern(uses_pattern): The paper uses this Pattern.Paper → Domain(in_domain): The paper belongs to this Domain.
Retrieval Auxiliary Edges:
Idea → Domain(belongs_to): Domain the Idea belongs to, weight = proportion.Pattern → Domain(works_well_in): Effectiveness of Pattern in this Domain, weight = effectiveness.Idea → Paper(similar_to_paper): Similarity weight (calculated in real-time in Path 3).
# 1. Build Nodes
python scripts/build_entity_v3.py
# Output: output/nodes_*.json (4 files)
# 2. Build Edges
python scripts/build_edges.py
# Output: output/edges.json, output/knowledge_graph_v2.gpickleExecution Time: Node construction 15 minutes (including LLM enhancement) + Edge construction 3 minutes.
| Path | Matching Object | Capture Dimension | Weight | Retrieval Count |
|---|---|---|---|---|
| Path 1 | Idea Description | Core idea similarity | 0.4 | Top-10 Pattern |
| Path 2 | Domain & Sub-domains | Domain generalization | 0.2 | Top-5 Pattern |
| Path 3 | Paper Title | Research theme similarity | 0.4 | Top-10 Pattern |
Performance Comparison:
Full Embedding: ~7 minutes (8,284 API calls)
Two-Stage Retrieval: ~27 seconds (100 API calls)
Speedup Ratio: 13x
Process:
Coarse Ranking: Jaccard fast filtering Top-100 (Milliseconds)
↓
Fine Ranking: Embedding precise sorting Top-10/20 (~27 seconds)
Jaccard Similarity (Coarse Ranking):
Jaccard(A, B) = |A ∩ B| / |A ∪ B|Embedding Similarity (Fine Ranking):
Cosine(A, B) = dot(emb_A, emb_B) / (norm(emb_A) * norm(emb_B))# Run independently
python scripts/simple_recall_demo.py "Your Research Idea"
# Use as a class
from recall_system import RecallSystem
system = RecallSystem()
results = system.recall(user_idea, verbose=True)
Output: List of Top-10 Patterns, each containing (pattern_id, pattern_info, score).
Goal: Ensure Pattern diversity.
Dimensions:
- Stability: Rank Top-3 + Cluster Size ≥ 15.
- Novelty: Cluster Size < 10.
- Cross-Domain: From Path 2/3 + Different Domain.
Goal: Organic fusion at the conceptual level, not just technical stacking.
Process:
Original Idea + New Pattern → LLM Generated Fused Idea
↓
Fused Idea contains:
- fused_core_idea: Core idea after fusion
- conceptual_bridge: Conceptual bridge
- reframed_problem: Reframed problem
- innovation_angle: Unique innovation angle
Example:
Original Idea: Use LLM for data augmentation
New Pattern: Curriculum Learning
Fused Idea: Difficulty-adaptive curriculum learning framework generated based on LLM
Goal: Assess fusion quality and ensure conceptual unity.
Scoring:
fusion_quality = 0.4 × Coherence + 0.4 × Fusion Richness + 0.2 × Fusion Idea Reward
Threshold: fusion_quality >= 0.65 is considered a successful fusion.
Roles:
- Reviewer A (Methodology): Technical soundness.
- Reviewer B (Novelty): Innovation.
- Reviewer C (Storyteller): Narrative completeness.
Pass Standard: Average Score >= 7.0.
Novelty Mode:
- Trigger: Novelty score stagnation (≤ Previous Round + 0.5).
- Process: Traverse all Novelty Patterns, each undergoing Fusion → Reflection → Generation → Critic.
- Fallback: Select the version with the highest score.
Score Degradation Rollback:
- Trigger: Any dimension score drops > 0.1.
- Process: Restore Story + Mark failure + Delete Tricks + Continue iteration.
Ordinary Refinement:
- Tail Injection: Lacks novelty → Inject unpopular Patterns (Rank 5-10).
- Head Injection: Lacks stability → Inject mature Patterns (Rank 1-3).
Verification: Retrieve top conference papers from the last 3 years; Similarity > 0.75 is considered a collision.
Avoidance: Pivot strategy to generate constraints (Domain shift, setting limitations, etc.), then regenerate Story.
python scripts/idea2story_pipeline.py "Your Research Idea"Output:
output/
├── final_story.json # Final Paper Story
├── pipeline_result.json # Complete Pipeline Result
└── log.json # Detailed Log
Execution Time: 3-10 minutes (depending on iteration count).
# scripts/build_entity_v3.py
# Data source paths
DATA_DIR = PROJECT_ROOT / "data" / "ICLR_25"
ASSIGNMENTS_FILE = DATA_DIR / "assignments.jsonl"
CLUSTER_LIBRARY_FILE = DATA_DIR / "cluster_library_sorted.jsonl"
PATTERN_DETAILS_FILE = DATA_DIR / "iclr_patterns_full.jsonl"
# LLM API Config
SILICONFLOW_API_KEY = os.getenv("SILICONFLOW_API_KEY")
LLM_API_URL = "https://api.siliconflow.cn/v1/chat/completions"
LLM_MODEL = "Qwen/Qwen2.5-7B-Instruct"# scripts/recall_system.py
class RecallConfig:
# Path Weights
PATH1_WEIGHT = 0.4 # Similar Idea
PATH2_WEIGHT = 0.2 # Domain Relevance
PATH3_WEIGHT = 0.4 # Similar Paper
# Retrieval Counts
PATH1_TOP_K_IDEAS = 10
PATH1_FINAL_TOP_K = 10
PATH2_TOP_K_DOMAINS = 5
PATH2_FINAL_TOP_K = 5
PATH3_TOP_K_PAPERS = 20
PATH3_FINAL_TOP_K = 10
FINAL_TOP_K = 10
# Two-Stage Retrieval
USE_EMBEDDING = True
TWO_STAGE_RECALL = True
COARSE_RECALL_SIZE = 100
FINE_RECALL_SIZE = 20# scripts/pipeline/config.py
class PipelineConfig:
# Pattern Selection
SELECT_PATTERN_COUNT = 3
CONSERVATIVE_RANK_RANGE = (0, 2)
INNOVATIVE_CLUSTER_SIZE_THRESHOLD = 10
# Critic Threshold
PASS_SCORE = 7.0
MAX_REFINE_ITERATIONS = 3
# Novelty Mode
NOVELTY_MODE_MAX_PATTERNS = 10
NOVELTY_SCORE_THRESHOLD = 6.0
NOVELTY_STAGNATION_DELTA = 0.5
# Reflection
FUSION_QUALITY_THRESHOLD = 0.65
# Rollback
SCORE_DEGRADATION_THRESHOLD = 0.1
# RAG Verification
COLLISION_THRESHOLD = 0.75
# Refinement Strategy
TAIL_INJECTION_RANK_RANGE = (4, 9)
HEAD_INJECTION_RANK_RANGE = (0, 2)
HEAD_INJECTION_CLUSTER_THRESHOLD = 15
# LLM Config
LLM_API_KEY = os.getenv("SILICONFLOW_API_KEY")
LLM_API_URL = "https://api.siliconflow.cn/v1/chat/completions"
LLM_MODEL = "Qwen/Qwen3-14B"# 1. Clone Project
cd /Users/gaoge/code/mycode/Idea2Paper/Paper-KG-Pipeline
# 2. Install Dependencies
pip install -r requirements.txt
# 3. Set Environment Variable
export SILICONFLOW_API_KEY="your_api_key_here"# Build Knowledge Graph (Run only once)
python scripts/build_entity_v3.py # 15 minutes
python scripts/build_edges.py # 3 minutes# Generate Paper Story
python scripts/idea2story_pipeline.py "Your Research Idea Description"
# Example
python scripts/idea2story_pipeline.py "Optimizing Large Model Inference Efficiency with Reinforcement Learning"# View Final Story
cat output/final_story.json
# View Complete Pipeline
cat output/pipeline_result.json
# View Detailed Log
cat output/log.json | jq '.'✅ LLM-Enhanced Pattern: Generate inductive summaries for each Pattern cluster.
✅ Dual-Layer Description: Concrete examples + Global summary, enabling both learning and understanding.
✅ Quality-Oriented Edge Weights: Calculate edge weights based on paper quality and Pattern effectiveness.
✅ Three-Way Complementary Retrieval: Capture relevance from Idea, Domain, and Paper dimensions.
✅ Two-Stage Optimization: Jaccard coarse ranking + Embedding fine ranking, 13x speedup.
✅ Real-Time Path 3 Calculation: Avoid pre-building redundant edges, ensuring complementarity.
✅ Idea Fusion: Organic fusion at the conceptual level rather than technical stacking.
✅ Story Reflection: Reflect on fusion quality to assess conceptual unity.
✅ Novelty-First Mode: Automatically upgrade to systemically improve innovation when stagnated.
✅ Intelligent Rollback: Avoid ineffective refinement to improve iteration efficiency.
✅ Fallback Strategy: Guarantee output quality by selecting the highest-scoring version.
- ✅ Fully automated process, no manual intervention required.
- ✅ Intelligent decision mechanisms (Novelty Mode, Rollback, Fallback).
- ✅ Adaptive parameter adjustment.
- Pattern Layer: LLM-enhanced high-quality Pattern library.
- Retrieval Layer: Three-way complementary retrieval, comprehensive coverage.
- Fusion Layer: Idea Fusion ensures conceptual unity.
- Reflection Layer: Story Reflection assesses fusion quality.
- Review Layer: Three-role Critic for comprehensive evaluation.
- Verification Layer: RAG avoids collision.
- ✅ Two-stage retrieval speeds up by 13x (7 mins → 27 secs).
- ✅ Intelligent rollback avoids ineffective iterations.
- ✅ Pattern failure marking avoids repeated attempts.
- ✅ LLM response caching reduces API calls.
- ✅ Modular design, easy to add new features.
- ✅ Supports incremental updates to the knowledge graph.
- ✅ Adaptable to other conference data sources.
- ✅ Can add new retrieval paths.
Current Limitation:
⚠️ Domain granularity is too coarse; 98 Domains cover 8,285 papers.
Future Direction:
- 📌 Introduce Domain hierarchy (Main Domain → Sub-domain).
- 📌 Use sub_domains for fine-grained matching.
- 📌 Extend to Review data from more conferences.
Current Limitation:
⚠️ Path 2 Domain matching is based on keywords, which may not be precise.⚠️ Retrieval speed still has room for optimization (27 seconds).
Future Direction:
- 📌 Use Embedding to calculate semantic similarity between Idea and Domain.
- 📌 Introduce vector database (Faiss/Milvus), speed up to 1-3 seconds.
- 📌 Pre-compute and cache all Embeddings.
Current Limitation:
⚠️ Fusion quality scoring relies on LLM, which may be unstable.⚠️ Novelty Mode traversing 10 Patterns may be time-consuming.
Future Direction:
- 📌 Introduce a learnable fusion quality scoring model.
- 📌 Optimize Pattern selection order based on historical data.
- 📌 Generate multiple Story candidates in parallel.
Current Limitation:
⚠️ Critic scoring relies on LLM and may fluctuate.⚠️ No user feedback mechanism.
Future Direction:
- 📌 Collect real review data to train dedicated Critic models.
- 📌 Introduce user feedback for online learning and weight adjustment.
- 📌 A/B test effects of different strategies.
| Document | Path | Content |
|---|---|---|
| Project Summary | docs/00_PROJECT_OVERVIEW.md |
This document, overall overview |
| KG Construction | docs/01_KG_CONSTRUCTION.md |
Data source, nodes, edges, execution method |
| Retrieval System | docs/02_RECALL_SYSTEM.md |
Three-way retrieval, similarity calculation, config |
| Idea2Story Pipeline | docs/03_IDEA2STORY_PIPELINE.md |
Pattern selection, Fusion, Reflection, Critic |
| Document | Path | Content |
|---|---|---|
| Edge Types | docs/EDGE_TYPES.md |
Detailed edge definitions and weight calculations |
| Pattern Scoring | docs/PATTERN_SCORING_EXPLAINED.md |
Pattern score calculation logic |
| Two-Stage Retrieval | docs/TWO_STAGE_RECALL_OPTIMIZATION.md |
Retrieval performance optimization details |
| Data Format | docs/Data_Format_Comparison.md |
V2 vs V3 data format changes |
The following documents record system evolution history, but core content has been integrated into the 4 main documents above:
NOVELTY_MODE_FIX.mdREFLECTION_REGENERATION_FIX.mdWORKFLOW_CORRECTION_2025-01-25.mdREFINE_SYSTEM_UPGRADE.mdRECALL_USAGE_V3.md- etc.
Paper-KG-Pipeline/
├── data/ # Data Sources
│ └── ICLR_25/
│ ├── assignments.jsonl
│ ├── cluster_library_sorted.jsonl
│ └── iclr_patterns_full.jsonl
│
├── output/ # Output Files
│ ├── nodes_*.json # 4 types of nodes
│ ├── edges.json # Edge data
│ ├── knowledge_graph_v2.gpickle # NetworkX graph
│ ├── final_story.json # Final Story
│ └── pipeline_result.json # Pipeline results
│
├── scripts/ # Core Scripts
│ ├── build_entity_v3.py # Build nodes
│ ├── build_edges.py # Build edges
│ ├── recall_system.py # Retrieval system (Class encapsulation)
│ ├── simple_recall_demo.py # Retrieval Demo
│ ├── idea2story_pipeline.py # Pipeline Main Entry
│ │
│ └── pipeline/ # Pipeline Modules
│ ├── config.py # Configuration parameters
│ ├── manager.py # Workflow orchestration
│ ├── pattern_selector.py # Pattern classification
│ ├── planner.py # Idea Fusion
│ ├── story_generator.py # Story generation
│ ├── story_reflector.py # Story reflection
│ ├── critic.py # Critic review
│ ├── refinement.py # Intelligent refinement
│ ├── verifier.py # RAG verification
│ └── utils.py # Utility functions
│
├── docs/ # Documentation
│ ├── 00_PROJECT_OVERVIEW.md # Project Summary (This file)
│ ├── 01_KG_CONSTRUCTION.md # KG Construction
│ ├── 02_RECALL_SYSTEM.md # Retrieval System
│ └── 03_IDEA2STORY_PIPELINE.md # Idea2Story Pipeline
│
└── requirements.txt # Dependencies
Knowledge Graph:
- Nodes: 16,791
- Edges: 444,872
- Pattern: 124 (124 LLM-enhanced)
- Idea Coverage: 100% (8,284/8,285)
Retrieval Speed:
- Full Embedding: ~7 minutes
- Two-Stage Retrieval: ~27 seconds
- Speedup Ratio: 13x
Pipeline Execution Time:
- Fastest: 3 minutes (First pass)
- Typical: 5-7 minutes (2-3 refinement rounds)
- Slowest: 10 minutes (Novelty Mode)
Critic Review:
- Pass Standard: Average Score >= 7.0
- Dimensions: Methodology, Novelty, Storyteller
- Novelty Mode Boost: 0.5-1.5 points
Fusion Quality:
- Threshold: >= 0.65
- Typical Value: 0.68-0.75
- Scoring Dimensions: Coherence (40%) + Fusion Richness (40%) + Fusion Idea Reward (20%)
# 1. First Run (Build Knowledge Graph)
python scripts/build_entity_v3.py
python scripts/build_edges.py
# 2. Generate Paper Story
python scripts/idea2story_pipeline.py "Your Research Idea"
# 3. View Results
cat output/final_story.jsonImprove Novelty:
# Increase Novelty Mode attempts
PipelineConfig.NOVELTY_MODE_MAX_PATTERNS = 15 # Default 10
# Increase Novelty weight
RecallConfig.PATH1_WEIGHT = 0.5 # Default 0.4, increase Similar Idea weightImprove Stability:
# Lower Fusion Quality Threshold
PipelineConfig.FUSION_QUALITY_THRESHOLD = 0.60 # Default 0.65
# Increase Head Pattern weight
RecallConfig.PATH3_WEIGHT = 0.5 # Default 0.4, increase High-Quality Paper weightAccelerate Retrieval:
# Reduce Retrieval Count
RecallConfig.PATH1_TOP_K_IDEAS = 5 # Default 10
RecallConfig.PATH3_TOP_K_PAPERS = 10 # Default 20# ✅ Novelty mode activated
grep "激活【新颖性模式】" output/log.json
# 📊 Fusion quality evaluation
grep "融合质量评分" output/log.json
# 🔁 Rollback triggered
grep "【ROLLBACK TRIGGERED】" output/log.json
# 🎉 Final Pass
grep "🎉 Critic 评审通过" output/log.jsonQ: API key invalid
# Check Environment Variable
echo $SILICONFLOW_API_KEY
# Set Environment Variable
export SILICONFLOW_API_KEY="your_key_here"Q: Missing dependencies
# Reinstall dependencies
pip install -r requirements.txt --upgradeQ: Node files do not exist
# Rebuild Knowledge Graph
python scripts/build_entity_v3.py
python scripts/build_edges.pyQ: Retrieval result is empty
# Check if Knowledge Graph is built successfully
ls -lh output/nodes_*.json
ls -lh output/knowledge_graph_v2.gpickleQ: Fusion quality always below threshold
# Lower threshold or improve Fusion Prompt
PipelineConfig.FUSION_QUALITY_THRESHOLD = 0.60Q: Novelty Mode traversed all but still did not pass
# Check fallback strategy in log
grep "兜底策略" output/log.json
# System automatically selects the highest scoring version to output
✅ Complete Knowledge Graph System: 16,791 nodes, 444,872 edges.
✅ Efficient Retrieval System: 13x speedup, second-level response.
✅ Intelligent Generation Pipeline: Fusion + Reflection + Critic + Intelligent Refinement.
✅ Quality Assurance Mechanism: Multi-layer checks, automatic rollback, fallback strategy.
✅ Complete Documentation System: 4 core documents covering construction, retrieval, generation.
✅ Conceptual Level Fusion: Idea Fusion achieves organic unity rather than technical stacking.
✅ Fusion Quality Reflection: Story Reflector assesses fusion effectiveness.
✅ Novelty First: Automatically upgrades to Novelty Mode when stagnated.
✅ Intelligent Rollback: Avoids ineffective refinement, improving efficiency.
✅ LLM-Enhanced Pattern: Dual-layer description improves usability.
✅ Research Assistance: Helps researchers quickly generate paper frameworks.
✅ Innovation Exploration: Discovers new research directions through Pattern fusion.
✅ Writing Guidance: Provides structured paper organization suggestions.
✅ Literature Survey: Quickly locates relevant work based on Knowledge Graph.
📌 Data Expansion: Integrate data from more conferences (CVPR, NeurIPS, ACL, etc.).
📌 Model Optimization: Train dedicated Fusion and Critic models.
📌 User Interaction: Introduce user feedback for online learning and optimization.
📌 Multi-modal Support: Integrate charts, formulas, code, and other multi-modal information.
Thanks to the ICLR 2025 paper dataset for support, and SiliconFlow for providing LLM API services.
Generated Date: 2026-01-25 Version: V1.0 Author: Idea2Paper Team
Contact: Refer to core documents for detailed technical support.