-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Summary
Automatically link semantically similar concepts across different sources — both during ingestion and offline. Expand ingestion beyond wiki pages to include podcast transcripts and technical books on relevant topics. The graph should continuously discover cross-source relationships, surfacing connections the user might never have noticed.
Motivation
Knowledge doesn't live in silos. A concept from an Anthropic podcast episode might directly relate to a chapter in a distributed systems textbook, which connects to a wiki page on CRDT sync. Today, these connections only exist if they happen to be in the same ingestion batch. Semantic linking would make the knowledge graph a true web of understanding across all sources.
Key Ideas
Cross-source semantic linking
- During ingestion: compare new concepts against the entire existing graph using embeddings, link semantically similar nodes even if they come from completely different sources
- Offline/background: periodically re-scan the graph for semantic similarities that weren't caught at ingestion time (e.g., after new concepts shift the embedding space)
- Use cosine similarity on Claude-computed concept embeddings to suggest links, with a confidence threshold
Expanded source types
- Podcast transcripts: Ingest transcripts (e.g., from Anthropic, Lex Fridman, technical podcasts) — either user-provided or fetched from podcast RSS feeds / transcript APIs
- Technical books: Ingest chapters or sections from relevant books (PDF, EPUB, or pasted text) — e.g., DDIA, SICP, or domain-specific references the user is studying
- Periodic ingestion: Schedule or prompt for re-ingestion of podcast feeds to pick up new episodes automatically
Relationship quality
- Semantic links should include an explanation of why two concepts are related (analogy, shared mechanism, contrast, etc.)
- Distinguish between explicit relationships (stated in source material) and inferred relationships (discovered via embedding similarity)
- Let users confirm, reject, or refine inferred links
Research
Transfer Learning: The Far Transfer Problem
Transfer of learning — applying knowledge from one context to another — is one of the most studied and most elusive goals in education. Barnett & Ceci (2002) formalized the near/far transfer taxonomy.
The sobering finding: Sala & Gobet (2019) found in a second-order meta-analysis that when controlling for placebo effects and publication bias, far-transfer effects are small or null. Spontaneous far transfer essentially doesn't happen.
However, analogical encoding changes this picture dramatically — the research consensus is not that far transfer is impossible, but that it requires specific instructional conditions that most learning environments fail to provide.
Barnett, S. M., & Ceci, S. J. (2002). When and where do we apply what we learn? A taxonomy for far transfer. Psychological Bulletin, 128(4), 612-637.
Sala, G., & Gobet, F. (2019). Near and far transfer in cognitive training. Collabra: Psychology, 5(1), 18.
Analogical Reasoning: Gentner and Holyoak
Structure-Mapping Theory (Gentner, 1983): Analogy works by mapping relational structure from a source domain to a target domain. Successful analogies preserve relational structure, not surface features — two concepts from different wiki collections may share relational structure even when their surface content is entirely different.
Analogical Encoding (Gentner, Loewenstein & Thompson, 2003): The landmark finding — comparing two cases side-by-side produces far more transfer than studying them separately. Graduate students who drew an analogy from two cases were nearly three times more likely to incorporate strategies from training cases into real negotiations. This directly supports a feature that surfaces cross-source connections and prompts learners to compare them.
Multiconstraint Theory (Holyoak & Thagard, 1989): Analogical mapping is governed by three interacting constraints: (1) structural consistency, (2) semantic similarity, and (3) pragmatic centrality. These map directly to an implementation: cosine similarity captures semantic similarity, knowledge graph structure captures structural consistency, and learner context provides pragmatic centrality.
Gentner, D., Loewenstein, J., & Thompson, L. (2003). Learning and transfer: A general role for analogical encoding. Journal of Educational Psychology, 95(2), 393-408.
Holyoak, K. J., & Thagard, P. (1989). Analogical mapping by constraint satisfaction. Cognitive Science, 13(3), 295-355.
LLMs and Analogical Reasoning
Recent studies (2024-2025) show that advanced LLMs match human performance on analogical reasoning tasks, validating the use of Claude for detecting cross-domain structural parallels at extraction time — aligning with the existing extraction pipeline.
Embedding-Based Semantic Discovery in Education
- Cosine similarity for educational content (MDPI Information, 2023): Knowledge graphs combined with cosine similarity of concept embeddings generate personalized educational recommendations.
- Prerequisite discovery via embeddings (JEDM, 2024): AI-assisted construction of educational knowledge graphs uses cosine similarity between concept embeddings to detect semantic references between concepts across different documents and courses.
- Contextual knowledge graphs (arXiv, 2024): Combining visual graph structures with semantic analysis can reveal novel intersections between fields — connections invisible to keyword searches but potentially transformative.
Cross-Disciplinary Knowledge Integration
- Cross-disciplinary learning (arXiv, 2020): Properly scaffolded cross-disciplinary connections lead to deeper understanding, while unscaffolded exposure to multiple disciplines does not. Scaffolding is essential.
- Knowledge integration from distant fields (ScienceDirect, 2022): Integrating knowledge from seemingly distant fields is positively associated with uniqueness in contribution when properly supported.
Key Takeaways for Implementation
- Far transfer is hard but achievable with analogical encoding: Explicitly surfacing cross-domain structural parallels (rather than hoping learners discover them) makes transfer 3x more likely (Gentner et al., 2003).
- Holyoak's three constraints map to a scoring function: Semantic similarity (cosine similarity of embeddings), structural consistency (graph topology), and pragmatic centrality (learner context/mastery) — all computable.
- Embeddings are proven for educational concept discovery: Multiple 2023-2024 studies demonstrate transformer-based embeddings reliably identify semantic relationships across documents and disciplines.
- Scaffolding is essential: The system must actively scaffold comparison — present concepts side by side, highlight structural parallels, and prompt reflection. A cross-source quiz item could ask "How does concept X from Collection A relate to concept Y from Collection B?"
Related
- Feature: typed relationships for cross-discipline semantic connections #38 — Typed relationships (semantic links need relationship types: analogy, contrast, shared-mechanism, etc.)
- Feature: Claude-computed concept embeddings for semantic discovery #39 — Concept embeddings (foundation for cosine similarity discovery)
- feat: video-synchronized knowledge graph highlighting with relationship explanations #74 — Video-synchronized graph highlighting (another expanded source type)
- Extraction skill in
.claude/skills/extracting-knowledge-graph/(needs to support new source formats)