This insight emerged during a Feb 15 2026 session where we:
- Ingested an 11-video Anthropic course on Agent Skills into Outline wiki
- Created an
extracting-knowledge-graphskill in.claude/skills/— encoding Engram's extraction workflow as a portable, progressively-disclosed skill - Noticed a comment in the skill: "SM-2 scheduling state is applied AFTER extraction because Claude has no knowledge of SM-2 when extracting"
- Asked: what if we used FSRS instead?
That question broke open something fundamental.
With SM-2, every new quiz card starts identically:
easeFactor: 2.5interval: 0repetitions: 0
There is nothing Claude can contribute at extraction time because the initial state is a constant. The extraction service and the scheduling engine are completely decoupled — not by design choice, but because SM-2 provides no mechanism for coupling them.
The extraction service produces (question, answer). The scheduler wraps it in (question, answer, easeFactor=2.5, interval=0, repetitions=0). Claude's understanding of the content's difficulty, complexity, and pedagogical weight is discarded.
FSRS (Free Spaced Repetition Scheduler) operates on three memory variables:
- Difficulty (D): Inherent complexity of the card (1-10). Affects how fast stability grows after review.
- Stability (S): Time in days for retrievability to drop from 100% to 90%.
- Retrievability (R): Probability of successful recall at a given moment.
The critical insight: Difficulty is a property of the card, not the learner's history.
In standard FSRS, initial difficulty is determined by the first rating: D₀(G) = w₄ - (G-3) · w₅. But Claude already has the information needed to predict that difficulty at extraction time:
- How many prerequisite concepts the answer depends on
- Whether the concept is abstract or concrete
- Whether the answer requires synthesis vs recall
- How information-dense the answer is
- How similar the concept is to commonly confused ones
Instead of extracting (question, answer), Claude extracts (question, answer, predicted_difficulty).
BEFORE (SM-2):
Extraction → (question, answer) → Fixed scheduling state
Claude's understanding of difficulty: DISCARDED
AFTER (FSRS):
Extraction → (question, answer, difficulty) → Informed scheduling state
Claude's understanding of difficulty: PRESERVED AND USED
This means the extraction skill we built teaches Claude not just how to extract knowledge, but how to predict how hard that knowledge is to learn. The scheduler trusts those predictions. The loop is closed.
FSRS introduces desired_retention — the target probability of recall when a card is scheduled. This is a knob SM-2 doesn't have.
Combined with our dependency-aware knowledge graph, we can set desired retention per-concept based on structural importance:
| Graph Position | Desired Retention | Reasoning |
|---|---|---|
| Hub concepts (many dependents) | 0.95 | Forgetting a hub blocks many downstream concepts |
| Standard concepts | 0.90 | Default FSRS target |
| Leaf concepts (no dependents) | 0.85 | Lower stakes — nothing downstream is blocked |
| Guardian-protected concepts | 0.97 | Game mechanic: guardians ensure their cluster stays strong |
| Repair mission targets | 0.95 | Elevated retention ensures tighter review scheduling for damaged concepts |
Repair missions use elevated desired_retention (0.95) instead of interval multiplication — telling the scheduler what retention level you actually want rather than hacking the output interval.
SM-2's easeFactor can drop to 1.3 and stay there permanently — the infamous "ease hell" where cards get stuck in daily review cycles with no escape.
FSRS prevents this with mean reversion in the difficulty update:
D′(D,G) = w₇ · D₀(3) + (1 - w₇) · (D - w₆ · (G - 3))
This pulls difficulty toward a midpoint after each review. Even if Claude's initial difficulty prediction is wrong, FSRS self-corrects without trapping the learner.
- Users always rate 5 → ease factor climbs → intervals grow fast → item barely reviewed
- Problem: Trivial questions don't build lasting knowledge
- SM-2 response: Nothing. The card just drifts away.
- Low initial difficulty + high ratings → stability grows naturally
- FSRS response: Card is scheduled further out, which is correct behavior
- But: If Claude predicted high difficulty and user rates it easy, mean reversion adjusts. Self-correcting.
- Users always rate 0-2 → ease factor drops to 1.3 → interval stuck at 1 day → frustrating
- Problem: "Ease hell" — no escape without manual intervention
- User must: Split into sub-concepts or reset the card manually
- High initial difficulty + low ratings → stability grows slowly but consistently
- FSRS response: Mean reversion prevents death spiral. The card adapts.
- And: Claude's predicted difficulty can trigger automatic sub-concept suggestions at extraction time: "This concept has predicted difficulty 9/10 — consider splitting"
This migration emerged from a session where we:
- Ingested a course about agent skills into Outline
- Built an extraction skill for Engram (a skill about how to extract knowledge)
- Realized the skill could be improved by the scheduling algorithm it references
- Discovered that FSRS closes a loop that SM-2 couldn't
Engram is a tool that learns from wikis. We used it to ingest a course about skills. That course taught us to build a skill that makes Engram's extraction better. And the skill revealed that the scheduling algorithm should change — which in turn changes the skill itself.
The tool is learning how to learn, and we're learning alongside it.
fsrs on pub.dev — pure Dart, v2.0.1, 160/160 pub points, MIT license.
- 21 model weights (FSRS-6)
- Configurable
desired_retention(0-1) - Configurable learning steps
- No native dependencies — pure Dart
Alternative: fsrs-rs-dart — Rust implementation with Flutter bindings via flutter_rust_bridge. Higher performance but adds native dependency.
Recommendation: Start with pure Dart fsrs package. Migrate to Rust bindings only if performance becomes an issue (unlikely for quiz scheduling).
- Add
fsrspackage to pubspec.yaml - Add
difficultyfield toQuizItem(nullable, defaults to null for existing cards) - Update extraction tool schema — add
predictedDifficulty(1-10) to quiz item output - Update extraction system prompt — add difficulty prediction guidelines
- Write FSRS engine — pure function mirroring SM-2 pattern, consuming
fsrspackage - Tests: Existing SM-2 tests continue passing; new FSRS tests for difficulty-informed scheduling
Note: The extraction skill's
references/sm2-constraints.mdwas already renamed toreferences/scheduling-constraints.mdand updated with FSRS content in PR #52.
- Scheduler selects engine based on card state:
- Cards with
difficulty != null→ FSRS engine - Cards with
difficulty == null(legacy) → SM-2 engine (or migrate withdifficulty = 5.0default)
- Cards with
- Quiz screen rating: SM-2 uses 0-5; FSRS uses Again/Hard/Good/Easy (4 grades). Need rating UI update.
- Desired retention provider — computes per-concept retention based on graph position
- Update mastery visualization — FSRS retrievability (0-1) maps more naturally to mastery colors than SM-2's binary "mastered/not" heuristic
- Auto-migrate legacy cards —
fromJson()bootstraps FSRS state (D=5.0 default) for any card missingstability/fsrsState - Remove SM-2 engine — deleted
sm2.dart,review_rating.dart,quality_rating_bar.dartand their tests - Replace 1.5x interval hack — mission concepts use elevated
desired_retention(0.95) instead of interval multiplication - Simplify mastery/analysis —
masteryStateOfuses FSRS retrievability only,isConceptMasteredusesfsrsState >= 2, challenge dialog usesisMasteredForUnlock - Test migration — shared
testQuizItem()helper, all 18 test files updated to FSRS-only assertions
- Preserve original prediction —
predictedDifficulty(write-once) andreviewCountfields onQuizItem, surviving FSRS mean reversion - Difficulty prediction evaluation — pure-function
evaluatePredictions()computes MAE and per-band accuracy (low/medium/high) after 5+ reviews; dashboard stats card shows results - Calibration feedback loop — extraction service accepts
DifficultyEvaluationResultand appends calibration note to Claude's prompt with past prediction accuracy - Auto sub-concept splitting — quiz items with
predictedDifficulty > 8are automatically split viagenerateSubConcepts()during ingestion (capped at 3 per document, non-fatal failures)
| Feature | Impact |
|---|---|
| #38 Typed relationships | Relationship types inform difficulty prediction — "depends on" chains increase predicted difficulty |
| #39 Concept embeddings | Embedding similarity could predict confusion-based difficulty (similar concepts = harder to distinguish) |
| #40 Local-first Drift/SQLite | Schema should account for FSRS D, S, R fields |
| #41 CRDT sync | FSRS card state (D, S, R) needs CRDT treatment — LWW-Register per field with lastReview as timestamp |
| Guardian system | desired_retention per cluster replaces crude interval multipliers |
| Network health | Retrievability (R) feeds NetworkHealthScorer directly as freshness |
Migrate from SM-2 to FSRS. The closed extraction↔scheduling loop is the primary motivation, but ease-hell prevention, per-concept desired retention, and principled game mechanic integration are strong secondary reasons. The pure Dart fsrs package makes this a clean replacement.
- FSRS Algorithm
- ABC of FSRS
- dart-fsrs package
- FSRS GitHub
- Anthropic Agent Skills course — ingested into Outline, catalyst for this investigation