This document provides a comprehensive technical overview of the Voynich manuscript translation system, including its architecture, components, and underlying methodology.
The Voynich Translation System is a deterministic, automated pipeline for translating Voynichese (the unknown script of the Voynich Manuscript) into Latin and English using a rule-based approach with polysemy support and morphological analysis.
voynich.nu β Download β Parse β Translate β Analyze β Update β Iterate
β β β β β β β
Source Folio Cleaned Latin + Gap Dictionary Loop
Text Words English Analysis Updates
- β Automated folio downloading from voynich.nu
- β Deterministic translation using rule-based dictionary
- β Context-aware polysemy (same word, different meanings by section)
- β Morphological processing (prefixes, suffixes, compounds)
- β Dual-language output (Latin + English)
- β Gap analysis and vocabulary extension
- β Coverage metrics and confidence scoring
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Voynich Translation System β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β ββββββββββββββββ ββββββββββββββββ β
β β Download ββββββββΆβ Parser β β
β β Folios β β (EVA) β β
β ββββββββββββββββ ββββββββββββββββ β
β β β
β βΌ β
β ββββββββββββββββ β
β β Translator βββββ voynich.yaml β
β β Engine β β
β ββββββββββββββββ β
β β β
β βββββββββββββ΄ββββββββββββ β
β βΌ βΌ β
β ββββββββββββββββ ββββββββββββββββ β
β β Latin β β English β β
β β Translation β β Translation β β
β ββββββββββββββββ ββββββββββββββββ β
β β β β
β βββββββββββββ¬ββββββββββββ β
β βΌ β
β ββββββββββββββββ β
β β Gap β β
β β Analysis β β
β ββββββββββββββββ β
β β β
β βΌ β
β ββββββββββββββββ β
β β Dictionary β β
β β Updates β β
β ββββββββββββββββ β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
- Downloads transcriptions from voynich.nu
- Caches locally to avoid redundant downloads
- Parses EVA (European Voynich Alphabet) format
- Cleans transcription artifacts
- Tracks metadata (section, word count)
- Core deterministic translation logic
- Loads dictionary from
voynich.yaml - Processes morphological rules
- Resolves polysemy based on context
- Handles repetition patterns
- Tracks confidence scores
- Maintains unknown word lists
- Command-line interface for translations
- Single folio or batch mode
- Auto-detects context from section
- Saves results as JSON
- Displays coverage statistics
- Force re-translation option
- Analyzes unknown words across translations
- Ranks by frequency and priority
- Identifies structural patterns (prefixes, suffixes)
- Suggests dictionary entries with reasoning
- Exports suggestions as JSON
- word_frequency.py: Frequency analysis tool
- morphology_analyzer.py: Morphological decomposition
- pattern_detector.py: Pattern and formula detection
- compound_decomposer.py: Compound word analysis
- batch_dictionary_updater.py: Dictionary management
- validation_checker.py: Dictionary integrity checks
- iteration_orchestrator.py: Full workflow automation
1. User requests folio (e.g., q02_f014v)
2. download_folios.py checks cache
3. If not cached:
- Fetch from voynich.nu
- Parse EVA transcription
- Clean artifacts (!, *, {}, etc.)
- Save to data/folios/
4. Update metadata.json
1. Load folio from cache
2. Extract word sequence
3. For each word:
a. Preprocess (remove nulls)
b. Check dictionary
c. Handle morphology (prefixes/suffixes)
d. Resolve polysemy (context-based)
e. Assign confidence score
4. Generate Latin text
5. Translate Latin β English
6. Calculate statistics
7. Save to data/translations/
1. Load all translation files
2. Collect unknown words
3. Count frequencies
4. Analyze word structure:
- Identify potential prefixes
- Identify potential suffixes
- Find embedded known roots
5. Calculate priority scores
6. Generate suggestions
7. Export to dictionary_suggestions.json
voynich/
βββ Core Scripts
β βββ download_folios.py # Folio downloader
β βββ translator.py # Translation engine
β βββ translate_folio.py # CLI interface
β βββ analyze_gaps.py # Gap analyzer
β βββ review_and_update.py # Dictionary updater
β
βββ Helper Scripts
β βββ scripts/
β βββ word_frequency.py # Frequency analysis
β βββ morphology_analyzer.py # Morphological decomposition
β βββ pattern_detector.py # Pattern detection
β βββ compound_decomposer.py # Compound analysis
β βββ batch_dictionary_updater.py # Dictionary management
β βββ validation_checker.py # Integrity checks
β βββ iteration_orchestrator.py # Workflow automation
β
βββ Configuration
β βββ voynich.yaml # Master dictionary (708 words)
β βββ agent_config.yaml # AI agent configuration
β βββ research_workflow.yaml # Workflow definition
β βββ vocabulary_rules.yaml # Morphological rules
β
βββ Data
β βββ data/
β β βββ folios/ # Downloaded folios
β β β βββ q01_f001r.txt
β β β βββ q02_f014v.txt
β β β βββ metadata.json
β β βββ translations/ # Translation outputs
β β β βββ q01_f001r_translation.json
β β β βββ q02_f014v_translation.json
β β βββ dictionary_suggestions.json # Gap analysis
β βββ reports/ # Iteration reports
β
βββ Documentation
βββ AI_RESEARCH_GUIDE.md # AI agent instructions
βββ WORKFLOW_INSTRUCTIONS.md # Step-by-step workflow
βββ VOCABULARY_EXTENSION_GUIDE.md # Linguistic methodology
βββ SYSTEM_ARCHITECTURE.md # This file
βββ RESEARCH_RESULTS.md # Performance and results
βββ DEVELOPMENT_GUIDE.md # Usage guide
βββ MASTER_INDEX.md # Navigation hub
βββ README.md # Project overview
voynich_decipherment_rules:
rules:
# Preprocessing rules
glyph_mapping:
# EVA glyphs to phonemes
vocab:
- word: fachys
latin: folium
description: "leaf; appears near plant leaves"
- word: chol
latin: caulis
description: "stem/stalk; botanical term"
polysemy:
- word: qokedy
meanings:
- latin: crescit
context: "herbal section, near plants"
- latin: lucet
context: "astronomical section, near stars"
- latin: fluit
context: "biological section, near water"
base: crescit{
"folio_id": "014v",
"section": "Herbal B",
"context": "herbal",
"voynich_text": "fachys ykal ar shy daiin...",
"latin_text": "folium altum et hic ad...",
"english_text": "leaf tall and here to...",
"word_translations": [
{
"original": "fachys",
"latin": "folium",
"english": "leaf",
"confidence": 0.9,
"method": "dictionary",
"notes": "near plants"
}
],
"statistics": {
"total_words": 267,
"known_words": 122,
"unknown_words": 145,
"coverage": 0.457,
"avg_confidence": 0.82
},
"unknown_words": ["word1", "word2", ...]
}{
"word": "kokaiin",
"frequency": 20,
"priority_score": 285.0,
"sections": ["q01", "q02"],
"contexts": ["herbal"],
"length": 7,
"analysis": {
"structure": {
"prefixes": [],
"suffixes": ["aiin"],
"potential_roots": ["kok"]
},
"patterns": ["contains_aiin_suffix"]
},
"suggested_latin": "maturat",
"reasoning": "Appears near fruits/seeds; compound kok + aiin (makes + is/was)"
}def translate_word(word: str, context: str) -> TranslationResult:
"""
Core translation algorithm
"""
# 1. Preprocess
word = preprocess_word(word) # Remove nulls, clean
# 2. Direct dictionary lookup
if word in dictionary:
return dictionary[word]
# 3. Check polysemy
if word in polysemy_dict:
return resolve_polysemy(word, context)
# 4. Handle prefixes (qo-, ot-, sh-, etc.)
if has_prefix(word):
prefix, root = extract_prefix(word)
if root in dictionary:
return apply_prefix_meaning(prefix, dictionary[root])
# 5. Handle suffixes (-aiin, -edy, -ar, etc.)
if has_suffix(word):
root, suffix = extract_suffix(word)
if root in dictionary:
return apply_suffix_meaning(dictionary[root], suffix)
# 6. Handle repetition (valde)
if is_repeated(word):
return apply_intensifier(word)
# 7. Unknown
return TranslationResult(
original=word,
latin="[unknown]",
confidence=0.0,
method="unknown"
)def resolve_polysemy(word: str, context: str) -> str:
"""
Resolve word meaning based on context
"""
polysemy_entry = polysemy_dict[word]
# Try to match context
for meaning in polysemy_entry['meanings']:
if context in meaning['context'].lower():
return meaning['latin']
# Fall back to base meaning
return polysemy_entry['base']def handle_qo_prefix(word: str, context: str) -> str:
"""
Handle qo- intensifier prefix
"""
if word.startswith('qo'):
root = word[2:] # Remove 'qo'
if root in dictionary:
base_latin = dictionary[root]['latin']
# Add intensifier
return f"valde {base_latin}"
return None- Dictionary lookup: O(1) average (hash table)
- Morphological analysis: O(k) where k = number of prefix/suffix patterns
- Polysemy resolution: O(m) where m = number of meanings per word
- Full folio translation: O(n) where n = number of words
- Current: 22 folios, 708-word dictionary
- Tested: Up to 6,655 words per batch
- Memory: < 50 MB for full system
- Speed: ~100-200 words/second
- Herbal B: 65.2% average coverage
- Herbal A: 52.0% average coverage
- Combined: 55.6% average coverage
- Best folio: 73.1% (q02_f014r)
The Voynich Manuscript is written in an encoded form of Medieval Latin using:
- Substitution cipher: Voynich glyphs β Latin phonemes
- Null glyphs: 'o' as filler to obscure patterns
- Abbreviations: Medieval shorthand (e.g., aiin = erat)
- Morphological consistency: Systematic prefix/suffix patterns
Based on EVA (European Voynich Alphabet):
| EVA | Phoneme | Latin | Notes |
|---|---|---|---|
| f | /f/ | f | Common in herbal (folium) |
| p | /p/ | p | Plant prefix (planta) |
| ch | /k/ | c | Hard consonant (caulis) |
| o | /o/ or null | o/! | Null or exclamation |
| l | /l/ | l | Liquid consonant |
| y | /j/ | i/y | Semivowel |
| k | /k/ | c/k | Hard consonant |
| t | /t/ | t | Stop consonant |
| e | /e/ | e | Vowel |
| d | /d/ | d | Stop, verb marker |
| q | /kw/ | qu | Intensifier prefix |
| ai | /ai/ | ae/e | Diphthong |
| sh | /Κ/ | sh | Location marker |
Prefixes:
qo-: Intensifier (valde) - confidence 0.9ot-: Source (ex) - confidence 0.8sh-: Location (hic) - confidence 0.8ch-: Botanical - confidence 0.7
Suffixes:
-aiin: State marker (est/erat) - confidence 0.9-edy: Action verb (movet) - confidence 0.8-ar: Conjunction (et) - confidence 0.7-ol: Location (locus) - confidence 0.6
-
Dictionary Validation
- YAML syntax checking
- Duplicate detection
- Required field verification
- Format validation
-
Translation Validation
- Coverage metrics
- Confidence scoring
- Unknown word tracking
- Statistical analysis
-
Morphological Validation
- Pattern consistency checks
- Prefix/suffix validation
- Root existence verification
# Graceful degradation
if not dictionary_loaded:
return "Error: Dictionary not loaded"
if word_not_found:
track_unknown(word)
return "[unknown]"
if confidence < threshold:
flag_for_review(word)Defines AI agent behavior:
- Workflow parameters (frequency thresholds, batch sizes)
- Confidence thresholds
- Validation gates
- Tool configurations
- Context awareness rules
- Decision framework
Defines research phases:
- Analyze (validate, coverage, identify unknowns)
- Propose (morphology, compounds, ranking)
- Validate (consistency, visual, polysemy)
- Implement (backup, update, validate)
- Test (re-translate, calculate, quality check)
- Report (generate, metrics, next steps)
Defines linguistic rules:
- Morphological decomposition rules
- Compound formation patterns
- Polysemy detection criteria
- Word family generation rules
- Validation rules for entries
- Same input β same output
- No randomness or ML uncertainty
- Reproducible results
- Each component has single responsibility
- Easy to test and maintain
- Can be used independently
- Dictionary can grow incrementally
- New rules can be added
- Supports future enhancements
- Configuration via YAML files
- Human-readable formats
- Easy to modify and version
- Every decision documented
- Confidence scores tracked
- Unknown words logged
V1.0 (Initial)
- Basic glyph mappings
- ~50 word dictionary
- Single context translation
V2.0 (Polysemy)
- Context-aware translation
- Polysemy system
- ~300 word dictionary
V3.0 (Morphology)
- Prefix/suffix handling
- Morphological analysis
- ~450 word dictionary
V4.0 (Systematic)
- Word family generation
- Duplicate cleanup
- English translation
- ~708 word dictionary
V5.0 (AI Agent)
- Complete helper scripts
- Workflow automation
- Configuration system
- Documentation suite
- Lines of Code: ~3,500 (Python)
- Configuration: ~2,000 (YAML)
- Documentation: ~15,000 words
- Dictionary Entries: 708 words
- Polysemy Entries: 10 words
- Supported Contexts: 5 (herbal, astronomical, biological, pharmaceutical, cosmological)
- Helper Scripts: 7 tools
- Folios Processed: 22 pages
- Total Words Analyzed: 6,655
- Unique Words Identified: 1,060
-
Machine Learning Layer
- Auto-suggest morphological decompositions
- Predict word meanings from context
- Pattern recognition for compounds
-
Visual Integration
- OCR for direct image processing
- Image-to-text correlation
- Botanical species identification
-
Database Backend
- SQL database for dictionary
- Query optimization
- Version control for entries
-
Web Interface
- Browser-based translation tool
- Interactive vocabulary editor
- Visualization dashboard
-
API Layer
- RESTful API for translations
- Integration with other tools
- Batch processing endpoints
python >= 3.8
httpx >= 0.24.0
pyyaml >= 6.0
pathlib (standard library)
json (standard library)
argparse (standard library)
- voynich.nu: Source of EVA transcriptions
- YAML: Configuration format
- EVA Alphabet: Standard transcription system
- EVA (European Voynich Alphabet): Standard transcription system
- YAML 1.2: Configuration file format
- JSON: Data interchange format
- UTF-8: Character encoding
- Currier's A/B Dialects: Glyph variation patterns
- Stolfi's Entropy Analysis: Information content studies
- Tiltman's Word Length: Statistical observations
- Takahashi's EVA: Digital transcription standard
System Status: OPERATIONAL β
Architecture Version: 5.0
Last Updated: November 27, 2025
For usage instructions, see DEVELOPMENT_GUIDE.md.
For research results, see RESEARCH_RESULTS.md.
For AI agent instructions, see AI_RESEARCH_GUIDE.md.