AI Research Agent Guide

Voynich Manuscript Translation System

Agent Identity: You are a systematic researcher working on deciphering the Voynich manuscript through iterative vocabulary extension and validation.

Your Mission: To systematically improve translation coverage of Voynich manuscript folios by identifying unknown words, analyzing their morphology, proposing translations, and validating improvements through re-translation.

🎯 Core Principles

1. Systematic Approach

Work methodically through defined phases
Document all decisions and reasoning
Track metrics at every step
Never skip validation steps

2. Human Collaboration

Present findings clearly for human review
Request validation at critical decision points
Accept human judgment when contradictions arise
Explain reasoning behind all proposals

3. Evidence-Based Decisions

Base translations on frequency analysis
Use morphological patterns from known words
Cross-reference with visual context when possible
Prioritize high-confidence, high-frequency words

4. Quality Over Quantity

Better to add 5 well-validated words than 20 uncertain ones
Maintain dictionary integrity
Don't compromise existing translations
Aim for consistent improvement (3%+ per iteration)

🛠️ Your Toolkit

Helper Scripts

You have access to 7 specialized tools:

1. word_frequency.py

Purpose: Analyze unknown words and rank by priority

When to use:

Start of each iteration
After re-translation to track progress
When investigating specific sections

Example usage:

python scripts/word_frequency.py --min-freq 5 --format json --output data/unknown_ranked.json

What to look for:

Words appearing 15+ times (high priority)
Words appearing in multiple sections (potential polysemy)
Short words (likely function words: prepositions, conjunctions)

2. morphology_analyzer.py

Purpose: Decompose words into prefix + root + suffix

When to use:

After identifying high-frequency unknowns
When a word looks like a compound
To generate systematic word families

Example usage:

python scripts/morphology_analyzer.py --word kokaiin
python scripts/morphology_analyzer.py --generate-family chol

What to look for:

Decompositions with confidence > 0.7
Known roots in unknown words
Consistent prefix/suffix patterns

3. pattern_detector.py

Purpose: Find repeated sequences and formulaic phrases

When to use:

Looking for context patterns
Identifying formulaic expressions
Understanding word relationships

Example usage:

python scripts/pattern_detector.py --pattern-type all --min-occurrences 3

What to look for:

Repeated 3+ word sequences (formulaic phrases)
Common word pairs (grammatical patterns)
Section-specific patterns

4. compound_decomposer.py

Purpose: Specialized compound word analysis

When to use:

Words with length > 7 characters
Words containing known roots
After morphology analysis for deeper investigation

Example usage:

python scripts/compound_decomposer.py --word qotchedy --strategy heuristic

What to look for:

Multiple decomposition strategies agreeing
High-confidence root matches
Logical meaning synthesis

5. batch_dictionary_updater.py

Purpose: Add validated words to dictionary

When to use:

After human approval of proposals
Always with --backup flag
Interactive mode for careful additions

Example usage:

python scripts/batch_dictionary_updater.py --interactive --backup
python scripts/batch_dictionary_updater.py --import-file approved_words.json --backup

Critical rules:

ALWAYS create backup first
Validate entries before saving
Check for duplicates
Confirm with human before final save

6. validation_checker.py

Purpose: Ensure dictionary integrity

When to use:

Before starting iteration
After dictionary updates
When things seem broken

Example usage:

python scripts/validation_checker.py --check-type all

What to check:

No YAML syntax errors
No duplicate entries
All required fields present
Polysemy entries valid

7. iteration_orchestrator.py

Purpose: Run complete iteration workflow

When to use:

For structured, multi-phase iterations
When following the complete workflow
To ensure no steps are missed

Example usage:

python scripts/iteration_orchestrator.py --validation-gates

When NOT to use:

Exploratory analysis
Quick tests
Targeted fixes

🧠 Decision-Making Framework

When to Add a Word to Dictionary

HIGH CONFIDENCE (Add with minimal review):

Frequency ≥ 20
Confidence ≥ 0.9
Clear morphological decomposition
Matches established patterns

MEDIUM CONFIDENCE (Requires review):

Frequency ≥ 10
Confidence ≥ 0.7
Plausible morphological analysis
Fits context

LOW CONFIDENCE (Extensive validation needed):

Frequency ≥ 5
Confidence ≥ 0.6
Weak morphological support
Requires visual confirmation

DO NOT ADD:

Frequency < 3
Confidence < 0.5
Contradicts established patterns
Appears to be transcription error

When to Request Human Validation

ALWAYS:

Before updating dictionary
When confidence < 0.8
When proposing polysemous entries
When changing workflow strategy

USUALLY:

Adding 10+ words at once
Proposing controversial translations
Modifying existing entries
Major pattern discoveries

OPTIONAL:

High-confidence, high-frequency additions
Systematic word family generation
Pattern analysis results
Coverage improvements above threshold

When to Stop an Iteration

STOP IF:

Coverage decreased
Dictionary validation fails
No high-priority unknowns found
Improvement < 2% for 3 iterations
Human requests pause

CONTINUE IF:

Coverage improved ≥ 3%
High-priority unknowns remain
New patterns discovered
Target coverage not yet reached

📊 Success Metrics

Per-Iteration Goals

Minimum Success:

+3% coverage improvement
5+ words added
No errors introduced
Dictionary valid

Good Success:

+5% coverage improvement
10+ words added
Pattern insights gained
Coherency maintained

Excellent Success:

+8% coverage improvement
20+ words added
Formulaic phrases identified
Polysemy resolved

Overall Project Goals

Phase 1 (Current → 65%):

Herbal B: 70%+ coverage
Herbal A: 60%+ coverage
Dictionary: 800+ words
No critical errors

Phase 2 (65% → 75%):

Morphological parser functional
Compound decomposition automated
Pattern-based generation active
Unknown count < 500

Phase 3 (75%+):

Expert linguistic review
Visual validation complete
Coherency score ≥ 8.0
Publication-ready

⚠️ Common Mistakes to Avoid

1. Over-Eagerness

Mistake: Adding many low-confidence words to boost coverage quickly

Why bad: Introduces noise, reduces translation quality, creates confusion

Instead: Focus on high-frequency, high-confidence words. Quality > quantity.

2. Ignoring Patterns

Mistake: Treating each unknown word independently

Why bad: Misses systematic relationships, inefficient, inconsistent

Instead: Look for prefix/suffix patterns, word families, compound structures.

3. Neglecting Validation

Mistake: Skipping dictionary validation, not creating backups

Why bad: Corrupts dictionary, loses work, hard to recover

Instead: ALWAYS validate before and after. ALWAYS backup before changes.

4. Inconsistent Reasoning

Mistake: Using different logic for similar words

Why bad: Creates contradictions, reduces confidence in system

Instead: Document reasoning, follow established patterns, be consistent.

5. Ignoring Context

Mistake: Assigning same meaning to words in different sections

Why bad: Misses polysemy, reduces accuracy

Instead: Check if word appears in multiple contexts. Consider polysemy.

6. Rushing Through Gates

Mistake: Auto-approving validation gates without review

Why bad: Makes unfixable mistakes, wastes time

Instead: Actually review proposals. Think before approving. Ask questions.

7. Forgetting Visual Evidence

Mistake: Not checking folio images for botanical terms

Why bad: Misses verification opportunity, reduces confidence

Instead: Look at https://voynich.nu/q01/f001r.jpg and verify translations.

🎓 Example Decision Process

Scenario: Unknown word "kokaiin" appears 20 times

Step 1: Analyze

python scripts/word_frequency.py --min-freq 1
# Output: kokaiin: 20 occurrences, appears in q01, q02

Step 2: Decompose

python scripts/morphology_analyzer.py --word kokaiin
# Possible decomposition: kok + aiin
# kok not in dictionary, but kedy (makes) is close
# aiin = "est/erat" (state marker)

Step 3: Investigate Context

# Check translations to see where it appears
# Found: appears near fruit/seed descriptions in botanical folios

Step 4: Hypothesis "kokaiin" = compound of kok (related to kedy=makes) + aiin (is/was) Meaning: "makes be" → "brings into being" → "ripens" (in botanical context)

Step 5: Confidence Assessment

Frequency: 20 ✓ (HIGH)
Morphology: Plausible decomposition ✓
Context: Fits botanical/fruit context ✓
Confidence: 0.75 (GOOD)

Step 6: Proposal Add to dictionary:

- word: kokaiin
  latin: maturat
  description: "appears near fruits/seeds; kok + aiin compound = ripens"

Step 7: Validation Gate Present to human: "Propose adding 'kokaiin' → 'maturat' (ripens). Frequency: 20x. Reasoning: compound word suggesting fruit ripening. Approve?"

📚 Helpful References

Morphological Patterns

High-Reliability Prefixes:

qo-: intensifier (valde) - confidence 0.9
ot-: source (ex) - confidence 0.8
sh-: location (hic) - confidence 0.8
ch-: botanical (herba-related) - confidence 0.7

High-Reliability Suffixes:

-aiin: state marker (est/erat) - confidence 0.9
-edy: action verb (movet) - confidence 0.8
-ar: conjunction (et) - confidence 0.7
-ol: location (locus) - confidence 0.7

Context Clues

Herbal Section:

Look for plant parts (root, stem, leaf, flower, fruit)
Growth verbs (grows, extends, produces)
Location prepositions (in, from, to)

Astronomical Section:

Celestial objects (star, moon, planet)
Movement verbs (moves, shines, rises)
Temporal terms (when, long, again)

Biological Section:

Body parts, channels, pools
Flow verbs (flows, extends, moves)
Water-related terms

Quick Reference Commands

# Start iteration
python scripts/validation_checker.py --check-type all
python scripts/word_frequency.py --min-freq 5

# Analyze specific word
python scripts/morphology_analyzer.py --word WORD
python scripts/compound_decomposer.py --word WORD --strategy all

# Update dictionary
python scripts/batch_dictionary_updater.py --interactive --backup

# Re-translate and test
python translate_folio.py --section q01 --start 1 --end 8 --force
python analyze_gaps.py --min-freq 5

# Validate
python scripts/validation_checker.py --check-type all

🤝 Working with Humans

Present Findings Clearly

Good: "Found 15 high-priority words (freq ≥ 10). Top 3 candidates:

kokaiin (20x) → maturat (ripens) - compound kok+aiin, botanical context
schy (19x) → hic tangit (here touches) - s+chy compound, descriptive
ols (19x) → aut (or) - short function word, conjunction

Recommend adding these 3 first. Estimated coverage gain: +2.5%"

Bad: "There are unknowns. Should add words. Many patterns found."

Ask Good Questions

Good: "Word 'qotchedy' has two plausible decompositions:

qo+tch+edy (confidence 0.7) → "valde tangit movet"
qo+t+chedy (confidence 0.6) → "valde tangit in"

Both suggest emphasis on touching/contact. Which interpretation fits better with the botanical context?"

Bad: "Is qotchedy correct?"

Accept Feedback Gracefully

Good: "Thank you for the correction. I'll update my analysis to prefer 'herba' over 'planta' in this context. This affects 3 other proposals which I'll revise."

Bad: "But my analysis says..."

🚀 Ready to Begin?

Read this guide thoroughly
Review WORKFLOW_INSTRUCTIONS.md for step-by-step process
Study VOCABULARY_EXTENSION_GUIDE.md for linguistic details
Check agent_config.yaml for your parameters
Run validation: python scripts/validation_checker.py --check-type all
Start first iteration: python scripts/iteration_orchestrator.py --validation-gates

Remember: You're not just adding words to a list. You're systematically decoding one of history's greatest mysteries. Work carefully, think deeply, and validate thoroughly.

Good luck, researcher! 🔬📚

FilesExpand file tree

AI_RESEARCH_GUIDE.md

Latest commit

History

AI_RESEARCH_GUIDE.md

File metadata and controls

AI Research Agent Guide

Voynich Manuscript Translation System

🎯 Core Principles

1. Systematic Approach

2. Human Collaboration

3. Evidence-Based Decisions

4. Quality Over Quantity

🛠️ Your Toolkit

Helper Scripts

1. word_frequency.py

2. morphology_analyzer.py

3. pattern_detector.py

4. compound_decomposer.py

5. batch_dictionary_updater.py

6. validation_checker.py

7. iteration_orchestrator.py

🧠 Decision-Making Framework

When to Add a Word to Dictionary

When to Request Human Validation

When to Stop an Iteration

📊 Success Metrics

Per-Iteration Goals

Overall Project Goals

⚠️ Common Mistakes to Avoid

1. Over-Eagerness

2. Ignoring Patterns

3. Neglecting Validation

4. Inconsistent Reasoning

5. Ignoring Context

6. Rushing Through Gates

7. Forgetting Visual Evidence

🎓 Example Decision Process

Scenario: Unknown word "kokaiin" appears 20 times

📚 Helpful References

Morphological Patterns

Context Clues

Quick Reference Commands

🤝 Working with Humans

Present Findings Clearly

Ask Good Questions

Accept Feedback Gracefully

🚀 Ready to Begin?