This guide consolidates all practical information for using, extending, and improving the Voynich translation system.
You now have a fully functional pipeline for:
- Downloading Voynich manuscript folios from voynich.nu
- Translating them deterministically (Latin + English)
- Analyzing gaps and identifying vocabulary priorities
- Extending the dictionary systematically
- Validating improvements through metrics
| Script | Purpose | Example |
|---|---|---|
download_folios.py |
Download and cache folios | python download_folios.py --section q02 --start 1 --end 20 |
translator.py |
Core translation engine | Used by other scripts (library) |
translate_folio.py |
Translate cached folios | python translate_folio.py --section q02 --folio 014v |
analyze_gaps.py |
Find unknown words | python analyze_gaps.py --min-freq 2 |
review_and_update.py |
Update dictionary | python review_and_update.py --add-examples |
| Script | Purpose | Example |
|---|---|---|
word_frequency.py |
Frequency analysis | python scripts/word_frequency.py --min-freq 5 |
morphology_analyzer.py |
Morphological decomposition | python scripts/morphology_analyzer.py --word kokaiin |
pattern_detector.py |
Pattern detection | python scripts/pattern_detector.py --pattern-type all |
compound_decomposer.py |
Compound word analysis | python scripts/compound_decomposer.py --word qotchedy |
batch_dictionary_updater.py |
Dictionary updates | python scripts/batch_dictionary_updater.py --interactive |
validation_checker.py |
Integrity checks | python scripts/validation_checker.py --check-type all |
iteration_orchestrator.py |
Full workflow automation | python scripts/iteration_orchestrator.py --validation-gates |
# 1. Download some folios
python download_folios.py --section q02 --start 14 --end 16
# 2. Translate them
python translate_folio.py --section q02 --start 14 --end 16
# 3. Analyze what's missing
python analyze_gaps.py --min-freq 2
# 4. Review suggestions and update dictionary
python scripts/batch_dictionary_updater.py --interactive
# 5. Re-translate with updated dictionary
python translate_folio.py --section q02 --folio 014v --force# Translate a single folio
python translate_folio.py --section q02 --folio 014r
# View the results
python translate_folio.py --section q02 --show 014rpip install httpx pyyaml# Check system status
python scripts/validation_checker.py --check-type all
# Should output: "β
VALIDATION PASSED"voynich/
βββ Core Scripts (download, translate, analyze)
βββ scripts/ (helper utilities)
βββ voynich.yaml (708-word dictionary)
βββ data/
β βββ folios/ (downloaded transcriptions)
β βββ translations/ (JSON outputs)
β βββ dictionary_suggestions.json (gap analysis)
βββ Documentation (this file and others)
Download untested folios:
# Herbal sections (well-illustrated, easier to validate)
python download_folios.py --section q01 --start 1 --end 8
python download_folios.py --section q02 --start 14 --end 16
# Other sections (when ready)
python download_folios.py --section q04 --start 67 --end 72 # Astronomical
python download_folios.py --section q05 --start 101 --end 105 # PharmaceuticalTranslate them all:
python translate_folio.py --section q01 --start 1 --end 8
python translate_folio.py --section q02 --start 14 --end 16Find high-frequency unknown words:
# Standard analysis (words appearing 3+ times)
python scripts/word_frequency.py --min-freq 3 --format json --output data/unknown_ranked.json
# Generate markdown report for easier reading
python scripts/word_frequency.py --min-freq 5 --format md --output data/frequency_report.mdDetect patterns:
# All pattern types
python scripts/pattern_detector.py --pattern-type all --min-occurrences 3
# Formulaic phrases only
python scripts/pattern_detector.py --pattern-type formulaic --min-occurrences 5Comprehensive gap analysis:
python analyze_gaps.py --min-freq 3 --max-suggestions 50
# Creates: data/dictionary_suggestions.jsonAnalyze single word:
python scripts/morphology_analyzer.py --word kokaiinBatch analysis:
# Create a file with high-priority words (one per line)
echo "kokaiin
schy
otchody" > priority_words.txt
python scripts/morphology_analyzer.py --batch-file priority_words.txt --output data/morphology_analysis.jsonGenerate word families:
# Generate systematic variations of a known root
python scripts/morphology_analyzer.py --generate-family cholCompound decomposition:
# Try all strategies
python scripts/compound_decomposer.py --word qotchedy --strategy all
# Heuristic strategy (usually best)
python scripts/compound_decomposer.py --word qotchedy --strategy heuristicpython scripts/batch_dictionary_updater.py --interactive --backup
# Follow prompts:
# Format: word|latin|description
# Example: kokaiin|maturat|appears near fruits/seeds; kok + aiin compoundEdit voynich.yaml:
vocab:
- word: kokaiin
latin: maturat
description: "appears near fruits/seeds; kok + aiin compound"
- word: schy
latin: hic tangit
description: "s + chy compound, describing touch"Create approved_words.json:
[
{
"word": "kokaiin",
"latin": "maturat",
"description": "appears near fruits/seeds; kok + aiin compound"
},
{
"word": "schy",
"latin": "hic tangit",
"description": "s + chy compound, describing touch"
}
]Then import:
python scripts/batch_dictionary_updater.py --import-file approved_words.json --backupValidate dictionary:
python scripts/validation_checker.py --check-type all
# Should output:
# β YAML syntax valid
# β No duplicates found
# β All entries have required fieldsRe-translate with updated dictionary:
# Force re-translation
python translate_folio.py --section q01 --start 1 --end 8 --force
python translate_folio.py --section q02 --start 14 --end 16 --forceCheck improvements:
# Re-analyze gaps
python analyze_gaps.py --min-freq 5 --max-suggestions 20
# View specific folio
python translate_folio.py --section q02 --show 014r
# Look for coverage increase in statisticsTrack metrics:
# Count vocabulary entries
grep -c "^ - word:" voynich.yaml
# Check for duplicates
python scripts/validation_checker.py --check-type dictionaryGenerate reports:
# Create iteration report
# (Manual: document what was added, coverage improvements, next priorities)# Single section with auto-detected range
python download_folios.py --section q01
# Custom range
python download_folios.py --section q01 --start 1 --end 5
# Force re-download with cleaning
python download_folios.py --section q01 --force
# List cached folios
python download_folios.py --list# Single folio
python translate_folio.py --section q02 --folio 014v
# Batch translate
python translate_folio.py --section q01 --start 1 --end 8
# Force re-translate
python translate_folio.py --section q01 --start 1 --end 8 --force
# Show existing translation
python translate_folio.py --section q01 --show 001r
# Custom context
python translate_folio.py --section q04 --folio 067r --context astronomical# Word frequency analysis
python scripts/word_frequency.py --min-freq 5 --format json
python scripts/word_frequency.py --min-freq 10 --top 20 --format md
# Gap analysis
python analyze_gaps.py --min-freq 3 --max-suggestions 30
python analyze_gaps.py --min-freq 10 --max-suggestions 10 # High-priority only
# Pattern detection
python scripts/pattern_detector.py --pattern-type all --min-occurrences 3
python scripts/pattern_detector.py --pattern-type pairs --section q01
# Morphological analysis
python scripts/morphology_analyzer.py --word kokaiin
python scripts/morphology_analyzer.py --batch-file words.txt --output analysis.json
# Compound decomposition
python scripts/compound_decomposer.py --word qotchedy --strategy all
python scripts/compound_decomposer.py --batch-file unknowns.txt# Interactive update
python scripts/batch_dictionary_updater.py --interactive --backup
# Batch import
python scripts/batch_dictionary_updater.py --import-file words.json --backup
# Add single word
python scripts/batch_dictionary_updater.py --add-word kokaiin --latin maturat --description "ripens"
# Validate only (no changes)
python scripts/batch_dictionary_updater.py --import-file words.json --validate-only# Validate all
python scripts/validation_checker.py --check-type all
# Validate dictionary only
python scripts/validation_checker.py --check-type dictionary
# Validate translations
python scripts/validation_checker.py --check-type translations
# Generate report
python scripts/validation_checker.py --check-type all --report-file validation_report.json# Full iteration with validation gates
python scripts/iteration_orchestrator.py --validation-gates
# Auto mode (bypass some gates)
python scripts/iteration_orchestrator.py --auto-mode
# Target specific coverage
python scripts/iteration_orchestrator.py --target-coverage 0.65 --validation-gates
# Multiple iterations
python scripts/iteration_orchestrator.py --iterations 3 --auto-modeWhy: Maximum impact per word added
How:
- Use
--min-freq 10or--min-freq 15to focus on top priorities - Words appearing 20+ times are critical
- Words appearing 10-19 times are high priority
Example:
python scripts/word_frequency.py --min-freq 15 --top 20Common Patterns:
- qo- prefix: Intensifier ("very") or specific verb
- -edy suffix: Verb marker (action/movement)
- -aiin suffix: State marker (is/was)
- ch- prefix: Common in herbal terms
How to use:
- Identify patterns in unknown words
- Look for known roots within compounds
- Build word families systematically
Check for polysemy:
- Same word in different sections β may have different meanings
- Example:
qokedy= "grows" (herbal) / "shines" (astronomical)
How:
# Check where word appears
grep -r "wordname" data/folios/Download folio images:
- Visit https://voynich.nu/q01/f001r.jpg (replace with relevant folio)
- Match unknown words with visible elements
- Near plants? β botanical term
- Near stars? β astronomical term
Example:
- If "kedy" = "facit" (makes)
- Then "qokedy" likely = "valde facit" (makes greatly)
- Add as dictionary entry or polysemy
In descriptions:
- word: kokaiin
latin: maturat
description: "kok (makes) + aiin (is/was) = ripens; appears near fruits/seeds"Why:
- Future reference
- Pattern identification
- Easier refinement
{
"folio_id": "014v",
"section": "Herbal B",
"context": "herbal",
"voynich_text": "fachys ykal ar shy daiin...",
"latin_text": "folium altum et hic ad...",
"english_text": "leaf tall and here to...",
"word_translations": [
{
"original": "fachys",
"latin": "folium",
"english": "leaf",
"confidence": 0.9,
"method": "dictionary",
"notes": "near plants"
}
],
"statistics": {
"total_words": 267,
"known_words": 122,
"unknown_words": 145,
"coverage": 0.457,
"avg_confidence": 0.82
},
"unknown_words": ["word1", "word2", ...]
}{
"word": "kokaiin",
"frequency": 20,
"priority_score": 285.0,
"sections": ["q01", "q02"],
"contexts": ["herbal"],
"length": 7,
"analysis": {
"structure": {
"prefixes": [],
"suffixes": ["aiin"],
"potential_roots": ["kok"]
}
},
"suggested_latin": "maturat",
"reasoning": "Appears near fruits/seeds; kok + aiin compound"
}These 10 words appear 175 times total. Adding them should boost coverage by ~3-5%.
-
kokaiin (20x) β "maturat" (ripens)
- Pattern: kok + aiin = makes + is = ripens
- Context: Near fruits/seeds
-
schy (19x) β "hic tangit" (here touches)
- Pattern: s + chy
- Context: Physical plant features
-
ols (19x) β "aut" (or)
- Pattern: Short function word
- Context: Conjunction
-
otchody (18x) β "variat extensum" (varies extended)
- Pattern: ot + chod + y
- Context: Growth patterns
-
dan (17x) β "de" (from/of)
- Pattern: Preposition
- Context: Source/possession
-
qokchor (16x) β "valde ramulus" (very branch)
- Pattern: qo + kchor
- Context: Branching structure
-
qotchey (16x) β "valde tangit" (very touches)
- Pattern: qo + tchey
- Context: Physical contact
-
qoy (16x) β "valde ad" (very to)
- Pattern: qo + y
- Context: Directional movement
-
yty (16x) β "transit" (passes through)
- Pattern: y-t-y
- Context: Flow or passage
-
charod (16x) β "plantae variatio" (plant variation)
- Pattern: char + od
- Context: Plant differences
When ready for a major leap in coverage (~10-15%), build a morphological parser:
class MorphologicalAnalyzer:
def __init__(self, dictionary):
self.dict = dictionary
self.prefixes = ['qo', 'ot', 'dy', 'ch', 's', 'y']
self.suffixes = ['ain', 'aiin', 'edy', 'ody', 'idy', 'ar', 'or', 'ol']
def decompose(self, word):
# Try: prefix + root + suffix
for prefix in self.prefixes:
if word.startswith(prefix):
remainder = word[len(prefix):]
for suffix in self.suffixes:
if remainder.endswith(suffix):
root = remainder[:-len(suffix)]
if root in self.dict:
return {
'prefix': prefix,
'root': root,
'suffix': suffix,
'confidence': 0.8
}
# Try: root + suffix (no prefix)
for suffix in self.suffixes:
if word.endswith(suffix):
root = word[:-len(suffix)]
if root in self.dict:
return {
'root': root,
'suffix': suffix,
'confidence': 0.7
}
return None
def synthesize_meaning(self, components):
prefix = components.get('prefix')
root = components['root']
suffix = components.get('suffix')
meaning = self.dict[root]['latin']
# Apply prefix modifier
if prefix == 'qo':
meaning = f"valde {meaning}" # intensifier
elif prefix == 'ot':
meaning = f"extendit {meaning}" # extends
# Apply suffix modifier
if suffix == 'edy':
meaning = f"{meaning} agit" # action verb
elif suffix in ['ain', 'aiin']:
meaning = f"{meaning} erat" # past state
return meaningThe morphology analyzer is already available as scripts/morphology_analyzer.py. Use it!
Solution:
# Download it first
python download_folios.py --section q02 --folio 014vSolution:
# Normal for first pass! Add more vocabulary
python analyze_gaps.py --min-freq 2
python scripts/batch_dictionary_updater.py --interactiveSolution:
# Install dependencies
pip install httpx pyyaml
# On macOS, if SSL issues persist:
pip install --upgrade certifiSolution:
# Validate dictionary
python scripts/validation_checker.py --check-type all
# Common issues:
# - Missing quotes around special characters
# - Incorrect indentation (use 2 spaces)
# - Missing hyphen before "word:"Solution:
# Check for duplicates
python scripts/validation_checker.py --check-type dictionary
# The validator will identify duplicatesSolution:
# Check what changed
# Restore from backup if needed
cp voynich.yaml.backup-[timestamp] voynich.yaml
# Re-translate
python translate_folio.py --section q01 --start 1 --end 8 --forceTrack these metrics to measure success:
| Metric | Start | Goal | Current |
|---|---|---|---|
| Vocabulary size | ~50 | 800+ | 708 β |
| Average coverage | ~10% | 65%+ | 56.6% |
| Herbal B coverage | ~45% | 65%+ | 67.0% β |
| Herbal A coverage | ~35% | 60%+ | 52.8% |
| Best folio | ~45% | 75%+ | 73.1% |
| Unknown words | ~90% | <35% | 44.0% |
After Adding Top 10 Words (+175 instances):
- Target: 60% average coverage
- Herbal A: ~56% (+3%)
- Herbal B: ~70% (+3%)
After Adding Top 20 Words (+290 instances):
- Target: 62-63% average coverage
- Herbal A: ~58% (+5%)
- Herbal B: ~72% (+5%)
After Building Morphological Parser:
- Target: 70-75% average coverage
- Automatically decompose compounds
- Handle prefix/suffix variations
You'll know you're making progress when:
- β Coverage increases after adding words
- β Translations read more naturally in English
- β Unknown word count decreases steadily
- β High-frequency unknowns appear in gap analysis
- β Patterns become clearer across folios
- β Confidence scores improve
- β Coherency increases in translations
AI_RESEARCH_GUIDE.md- Complete AI agent instructionsWORKFLOW_INSTRUCTIONS.md- Detailed step-by-step workflowVOCABULARY_EXTENSION_GUIDE.md- Linguistic methodologySYSTEM_ARCHITECTURE.md- Technical architectureRESEARCH_RESULTS.md- Performance and analysisMASTER_INDEX.md- Navigation hub
- voynich.nu - EVA transcriptions and folio images
- Wikipedia - Voynich Manuscript overview
- Yale Beinecke - High-resolution manuscript scans
- Look at folio images
- Match words to visual elements
- Confirms translation makes sense
- High frequency (>15x) = core vocabulary
- Medium frequency (5-15x) = important
- Low frequency (<5x) = wait for patterns
- Many unknowns are compounds
- Break down: kokaiin = kok + aiin
- Guides meaning
- Compare Herbal A vs Herbal B
- Different meaning β polysemy
- Same meaning β main vocabulary
- Future you will thank you
- Helps identify patterns
- Easier to refine
# 1. Validate system
python scripts/validation_checker.py --check-type all
# 2. Download folios
python download_folios.py --section q02 --start 14 --end 16
# 3. Translate
python translate_folio.py --section q02 --start 14 --end 16
# 4. Analyze
python analyze_gaps.py --min-freq 5
# 5. Start adding words!
python scripts/batch_dictionary_updater.py --interactive --backup# === ESSENTIAL COMMANDS ===
# Validate system
python scripts/validation_checker.py --check-type all
# Translate folio
python translate_folio.py --section q02 --folio 014r
# Analyze gaps
python analyze_gaps.py --min-freq 5
# Update dictionary
python scripts/batch_dictionary_updater.py --interactive --backup
# Re-translate after updates
python translate_folio.py --section q02 --start 14 --end 16 --force
# Check word frequency
python scripts/word_frequency.py --min-freq 10 --top 20
# Analyze morphology
python scripts/morphology_analyzer.py --word kokaiin
# Full iteration
python scripts/iteration_orchestrator.py --validation-gatesSystem Status: OPERATIONAL β
Documentation: COMPLETE β
Ready for: Iterative Improvement π
For technical details, see SYSTEM_ARCHITECTURE.md.
For research results, see RESEARCH_RESULTS.md.
For AI agent use, see AI_RESEARCH_GUIDE.md.