Skip to content

Commit adb60ad

Browse files
jeremymanningclaude
andcommitted
Add answer option randomization and question generation skill
Randomize answer option display order (Fisher-Yates shuffle) each time a question is shown. Display keys map back to original keys so that answer checking, response storage, and exports are unaffected. Add .claude/skills/generate-questions.md — a structured skill for generating high-quality domain questions using Claude agents with TodoWrite checkpointing for resumability. Co-Authored-By: Claude (claude-opus-4-6) <noreply@anthropic.com>
1 parent e5f52e7 commit adb60ad

File tree

12 files changed

+278
-12
lines changed

12 files changed

+278
-12
lines changed
Lines changed: 198 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,198 @@
1+
# Skill: Generate Domain Questions
2+
3+
Generate high-quality multiple-choice questions for the Knowledge Mapper application.
4+
5+
## When to Use
6+
7+
Use this skill when asked to generate or regenerate questions for a domain (e.g., "generate questions for biology", "regenerate the physics question set").
8+
9+
## Context
10+
11+
Knowledge Mapper is a GP-based knowledge estimation app. Users answer multiple-choice questions positioned on a 2D map of Wikipedia articles. Question quality directly impacts the usefulness of knowledge estimation.
12+
13+
### Current Problems with Questions
14+
- Questions are too long and verbose (avg ~190 chars, options up to 1000 chars)
15+
- Many questions can be answered by logic alone rather than actual knowledge
16+
- Difficulty levels don't clearly separate vocabulary knowledge from deep understanding
17+
- Distractors are often obviously wrong or implausibly long compared to the correct answer
18+
19+
### Target Question Format
20+
21+
```json
22+
{
23+
"id": "<16-char hex>",
24+
"question_text": "...",
25+
"options": { "A": "...", "B": "...", "C": "...", "D": "..." },
26+
"correct_answer": "A",
27+
"difficulty": 3,
28+
"x": 0.224806,
29+
"y": 0.56408,
30+
"z": 0.0,
31+
"source_article": "photosynthesis",
32+
"domain_ids": ["biology"],
33+
"concepts_tested": ["photosynthesis", "cellular respiration"]
34+
}
35+
```
36+
37+
### Length Targets
38+
- **Question text**: 50-100 words (roughly 250-600 characters). Concise but specific.
39+
- **Each answer option**: 25-50 words (roughly 125-300 characters). All four options should be similar in length and style.
40+
- **All options must be plausible** to someone without domain expertise.
41+
42+
### Difficulty Levels (1-4)
43+
44+
Assign each of the 50 concepts a difficulty level (roughly equal distribution: ~12-13 per level):
45+
46+
- **Level 1 — High-level vocabulary**: Can someone identify what this concept IS? Tests recognition of major terms and their basic definitions. Someone who has heard of the field can likely answer these.
47+
- **Level 2 — Low-level vocabulary**: Can someone identify specific technical terms, sub-components, or named results within this concept? Tests familiarity with the detailed terminology that practitioners use.
48+
- **Level 3 — Basic working knowledge**: Can someone apply or reason about this concept? Tests understanding that goes beyond definitions — requires knowing how things relate, why they matter, or what happens when you combine them. Cannot be answered through logic alone or rote memorization alone.
49+
- **Level 4 — Deep knowledge**: Can someone handle nuance, edge cases, or cross-cutting implications of this concept? Tests expert-level understanding — subtle distinctions, historical context of discoveries, common misconceptions among practitioners, or non-obvious connections to other concepts. Cannot be answered through logic alone or rote memorization alone.
50+
51+
### Critical Quality Rules
52+
53+
1. **Logic-proof**: A smart person with NO domain knowledge should NOT be able to answer correctly through reasoning alone. Avoid options that are self-contradictory, obviously absurd, or eliminable by logic.
54+
2. **Uniform option length**: All 4 options for a given question must be approximately the same length (within ~30% of each other). The correct answer must NOT be systematically longer or shorter.
55+
3. **Uniform option style**: All 4 options should use the same grammatical structure, level of specificity, and tone. No option should stand out stylistically.
56+
4. **Plausible distractors**: Each distractor must sound reasonable to a non-expert. It should contain real terminology from the domain, not made-up terms.
57+
5. **LaTeX formatting**: Use `$...$` for inline math expressions. Escape dollar signs in non-math contexts. Use proper LaTeX for equations, variables, and mathematical notation.
58+
6. **No giveaways**: Avoid "all of the above", "none of the above", absolute qualifiers ("always", "never") that signal incorrectness, or hedging language ("sometimes", "may") that signals correctness.
59+
60+
## Procedure
61+
62+
IMPORTANT: Use the TodoWrite tool throughout this entire process to track every step and every question. This allows resuming if context runs out.
63+
64+
### Phase 1: Concept Generation
65+
66+
**Step 1.1**: Create the master todo list:
67+
68+
```
69+
TodoWrite([
70+
{ content: "Phase 1: Generate 50 core concepts for {domain}", status: "in_progress", activeForm: "Generating core concepts" },
71+
{ content: "Phase 2: Curate and deduplicate concept list", status: "pending", activeForm: "Curating concept list" },
72+
{ content: "Phase 3: Generate questions (0/50 complete)", status: "pending", activeForm: "Generating questions" },
73+
{ content: "Phase 4: Quality review and validation", status: "pending", activeForm: "Reviewing question quality" },
74+
{ content: "Phase 5: Assemble domain JSON file", status: "pending", activeForm: "Assembling domain JSON" },
75+
])
76+
```
77+
78+
**Step 1.2**: Generate 60 candidate concepts that are central to the domain. For each concept, note:
79+
- The concept name
80+
- A 1-sentence description of why it's central to this domain
81+
- The most relevant Wikipedia article title
82+
83+
**Step 1.3**: Write the candidate list to a working file: `data/domains/.working/{domain-id}-concepts.json`
84+
85+
### Phase 2: Concept Curation
86+
87+
**Step 2.1**: Review the 60 candidates. Remove:
88+
- Duplicates or near-duplicates (e.g., "DNA replication" and "replication of DNA")
89+
- Concepts that are too broad (e.g., "science") or too narrow (e.g., "Figure 3 in Smith et al. 2019")
90+
- Concepts that heavily overlap (keep the more central one)
91+
92+
**Step 2.2**: Rank remaining concepts by centrality to the domain. Keep the top 50.
93+
94+
**Step 2.3**: Assign difficulty levels to the 50 concepts:
95+
- ~12-13 concepts at Level 1 (high-level vocabulary)
96+
- ~12-13 concepts at Level 2 (low-level vocabulary)
97+
- ~12-13 concepts at Level 3 (basic working knowledge)
98+
- ~12-13 concepts at Level 4 (deep knowledge)
99+
100+
Level assignment should reflect how specialized the concept is, NOT how hard the question will be to write. Central, well-known concepts get Level 1-2. Specialized, nuanced concepts get Level 3-4.
101+
102+
**Step 2.4**: Update the working file with the curated, ranked, leveled list.
103+
104+
### Phase 3: Question Generation (per concept)
105+
106+
For EACH of the 50 concepts, follow this sub-procedure. Update TodoWrite after EVERY question:
107+
108+
**Step 3.1 — Research**: Use WebFetch to read the Wikipedia article for this concept:
109+
```
110+
WebFetch({ url: "https://en.wikipedia.org/wiki/{article_title}", prompt: "Summarize the key facts, definitions, relationships, and nuances of {concept}. Focus on what distinguishes expert knowledge from surface knowledge." })
111+
```
112+
113+
**Step 3.2 — Generate question**: Based on the Wikipedia content and the assigned difficulty level, write the question text. Follow the level definitions:
114+
- Level 1: Test recognition of what this concept IS
115+
- Level 2: Test knowledge of specific technical terms within the concept
116+
- Level 3: Test ability to reason about or apply the concept
117+
- Level 4: Test expert-level nuance, edge cases, or cross-cutting connections
118+
119+
The question must be 50-100 words. It must be impossible to answer through logic alone.
120+
121+
**Step 3.3 — Generate correct answer**: Write the correct answer (25-50 words). It must be factually accurate per the Wikipedia article.
122+
123+
**Step 3.4 — Generate distractors**: Generate 3 incorrect options. For EACH distractor:
124+
1. Start from the correct answer
125+
2. Change ONE specific factual claim to make it incorrect
126+
3. Verify the distractor:
127+
- Uses real domain terminology (not made-up words)
128+
- Is approximately the same length as the correct answer (within ~30%)
129+
- Uses the same grammatical structure as the correct answer
130+
- Would sound plausible to a non-expert
131+
- Cannot be eliminated through logic alone
132+
4. If the distractor fails any check, regenerate it
133+
134+
**Step 3.5 — Assign option slots**: Randomly assign the correct answer and 3 distractors to slots A, B, C, D. Use a different random arrangement for each question (do NOT always put the correct answer in slot A). Record which slot contains the correct answer.
135+
136+
**Step 3.6 — Self-check**: Before finalizing, verify:
137+
- [ ] Question is 50-100 words
138+
- [ ] Each option is 25-50 words
139+
- [ ] All options are within ~30% length of each other
140+
- [ ] No option is eliminable through logic alone
141+
- [ ] LaTeX is properly formatted (if applicable)
142+
- [ ] The correct answer is factually accurate
143+
- [ ] Each distractor contains exactly one changed factual claim
144+
- [ ] Correct answer slot varies across questions
145+
146+
**Step 3.7 — Update progress**: Update the TodoWrite with current progress:
147+
```
148+
{ content: "Phase 3: Generate questions ({N}/50 complete)", ... }
149+
```
150+
151+
Also write each completed question to the working file incrementally:
152+
`data/domains/.working/{domain-id}-questions.json`
153+
154+
### Phase 4: Quality Review
155+
156+
**Step 4.1**: Read through ALL 50 questions and check for:
157+
- Option length uniformity within each question
158+
- Correct answer position distribution (should be ~even across A/B/C/D)
159+
- No repeated patterns in distractor construction
160+
- LaTeX consistency
161+
- No two questions testing the exact same knowledge
162+
163+
**Step 4.2**: Fix any issues found. Log fixes in TodoWrite.
164+
165+
**Step 4.3**: Verify the correct answer position distribution:
166+
- Count how many times each slot (A/B/C/D) contains the correct answer
167+
- If any slot has fewer than 9 or more than 16, reassign some correct answer slots to balance
168+
169+
### Phase 5: Assembly
170+
171+
**Step 5.1**: Read the existing domain file to preserve the `domain`, `labels`, and `articles` sections:
172+
```
173+
Read({ file_path: "data/domains/{domain-id}.json" })
174+
```
175+
176+
**Step 5.2**: Determine spatial coordinates for each question. Each question needs `x`, `y` coordinates within the domain's region (from index.json). Use the source article's coordinates if available in the articles array, otherwise distribute evenly within the region.
177+
178+
**Step 5.3**: Generate a unique 16-character hex ID for each question:
179+
```python
180+
import hashlib
181+
id = hashlib.md5(f"{domain_id}:{concept}:{question_text[:50]}".encode()).hexdigest()[:16]
182+
```
183+
184+
**Step 5.4**: Assemble the complete domain JSON file with the new questions replacing the old ones. Write to `data/domains/{domain-id}.json`.
185+
186+
**Step 5.5**: Update `data/domains/all.json` — replace the old questions for this domain with the new ones, preserving questions from other domains.
187+
188+
**Step 5.6**: Update `data/domains/index.json` — update the `question_count` for this domain if it changed.
189+
190+
**Step 5.7**: Clean up working files in `data/domains/.working/`.
191+
192+
## Important Notes
193+
194+
- **Checkpointing**: Write progress to `data/domains/.working/` after EVERY question. If context runs out, the next agent can resume from the working files.
195+
- **Model quality**: This skill should be run with Claude Opus for highest question quality. Do NOT use a smaller model.
196+
- **One domain at a time**: Generate questions for one domain per invocation. The caller can invoke this skill in parallel for multiple domains.
197+
- **Preserve coordinates**: If the existing questions have well-placed coordinates near their source articles, try to reuse those coordinates for questions about the same articles.
198+
- **TodoWrite is mandatory**: Every phase transition and every completed question MUST be reflected in TodoWrite. This is non-negotiable for resumability.

src/ui/quiz.js

Lines changed: 45 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,31 @@ let nextCallback = null;
77
let currentQuestion = null;
88
let uiElements = {};
99

10+
// Randomization: maps display key → original key (e.g., { A: "C", B: "A", C: "D", D: "B" })
11+
let displayToOriginal = { A: 'A', B: 'B', C: 'C', D: 'D' };
12+
let originalToDisplay = { A: 'A', B: 'B', C: 'C', D: 'D' };
13+
14+
/** Fisher-Yates shuffle (in place). */
15+
function shuffle(arr) {
16+
for (let i = arr.length - 1; i > 0; i--) {
17+
const j = Math.floor(Math.random() * (i + 1));
18+
[arr[i], arr[j]] = [arr[j], arr[i]];
19+
}
20+
return arr;
21+
}
22+
23+
/** Build random key mappings for option display. */
24+
function buildShuffleMaps() {
25+
const keys = ['A', 'B', 'C', 'D'];
26+
const shuffled = shuffle([...keys]);
27+
displayToOriginal = {};
28+
originalToDisplay = {};
29+
keys.forEach((displayKey, i) => {
30+
displayToOriginal[displayKey] = shuffled[i];
31+
originalToDisplay[shuffled[i]] = displayKey;
32+
});
33+
}
34+
1035
export function init(container) {
1136

1237
const styleId = 'quiz-panel-styles';
@@ -268,13 +293,16 @@ export function showQuestion(question) {
268293
}
269294

270295
if (uiElements.options) {
296+
// Randomize option order each time a question is displayed
297+
buildShuffleMaps();
271298
uiElements.options.forEach(btn => {
272-
const key = btn.dataset.key;
273-
const text = question.options ? question.options[key] : '';
299+
const displayKey = btn.dataset.key; // A, B, C, D (button slot)
300+
const originalKey = displayToOriginal[displayKey]; // which original option to show here
301+
const text = question.options ? question.options[originalKey] : '';
274302
const renderedText = renderLatex(text || '');
275303
btn.innerHTML = renderedText;
276304
// T049: Update ARIA label with option text
277-
btn.setAttribute('aria-label', `Option ${key}: ${stripLatex(text || '')}`);
305+
btn.setAttribute('aria-label', `Option ${displayKey}: ${stripLatex(text || '')}`);
278306
btn.disabled = false;
279307
btn.className = 'quiz-option';
280308
});
@@ -295,23 +323,28 @@ export function showQuestion(question) {
295323
}
296324
}
297325

298-
function handleOptionClick(selectedKey) {
326+
function handleOptionClick(selectedDisplayKey) {
299327
if (!currentQuestion || !uiElements.options) return;
300-
328+
301329
uiElements.options.forEach(btn => {
302330
btn.disabled = true;
303331
});
304332

305-
const correctKey = currentQuestion.correct_answer;
306-
const isCorrect = selectedKey === correctKey;
333+
// Translate display keys back to original keys for correctness check
334+
const originalSelectedKey = displayToOriginal[selectedDisplayKey];
335+
const originalCorrectKey = currentQuestion.correct_answer;
336+
const isCorrect = originalSelectedKey === originalCorrectKey;
337+
338+
// Highlight uses display keys (which button slot to color)
339+
const correctDisplayKey = originalToDisplay[originalCorrectKey];
307340

308341
uiElements.options.forEach(btn => {
309342
const key = btn.dataset.key;
310-
if (key === selectedKey) {
343+
if (key === selectedDisplayKey) {
311344
btn.classList.add('selected');
312345
btn.classList.add(isCorrect ? 'correct' : 'incorrect');
313346
}
314-
if (key === correctKey) {
347+
if (key === correctDisplayKey) {
315348
btn.classList.add('correct-highlight');
316349
}
317350
});
@@ -364,10 +397,10 @@ function handleOptionClick(selectedKey) {
364397
}
365398
}
366399

367-
// Fire the answer callback immediately (records the response and updates estimator)
368-
// but do NOT auto-advance to next question — user clicks "Next" when ready
400+
// Fire the answer callback with the ORIGINAL key (not the display key)
401+
// so that app.js records the correct option letter regardless of shuffle
369402
if (answerCallback) {
370-
answerCallback(selectedKey, currentQuestion);
403+
answerCallback(originalSelectedKey, currentQuestion);
371404
}
372405
}
373406

-86 Bytes
Loading
-136 Bytes
Loading
-1.28 KB
Loading
-627 Bytes
Loading
593 Bytes
Loading
-877 Bytes
Loading
-1.23 KB
Loading
-3.98 KB
Loading

0 commit comments

Comments
 (0)