Translate Earlier Egyptian transliterations to English using state-of-the-art AI and Retrieval-Augmented Generation (RAG).
This tool translates Ancient Egyptian transliterations (like แธฅtp dj njswt) into English through a sophisticated AI pipeline:
- Normalizes the Egyptian text
- Searches a database of 9,000 expert translations for similar examples
- Translates to German using a large language model with context
- Converts the German to English
Example:
Input: แธฅtp dj njswt
Output: A sacrifice given by the King.
- Python 3.8 or higher
- An Ollama API key (Get one here)
- 5GB free disk space
This project uses Ollama Cloud for LLM-based translation. Before running the system, you must download and enable the required model.
- Step 1: Install Ollama
Download and install Ollama from: https://ollama.com
Verify installation:
ollama --version- Step 2: Pull the Required Model
Run the following command to download the model:
ollama pull qwen3-vl:235b-instruct-cloud- Clone the repository:
git clone https://github.com/yourusername/egyptian-rag-translator.git
cd egyptian-rag-translator- Create virtual environment:
# Using uv (recommended - faster)
uv init
uv venv
# OR using standard Python
python -m venv .venv- Activate environment:
Windows:
.venv\Scripts\activateLinux/Mac:
source .venv/bin/activate- Install dependencies:
# Using uv (faster)
uv pip install -r requirements.txt
# OR using pip
pip install -r requirements.txt- Configure API key:
Create a .env file in the project root:
OLLAMA_API_KEY=your_api_key_here- Setup the system (one command):
python setup.pyThis will automatically:
- Download the Egyptian dataset (~9,000 texts)
- Process and clean the data
- Generate AI embeddings (~30 minutes)
- Build the search database
Note: The setup script is smart - it won't re-download or re-process if files already exist.
# Basic translation
python main.py "แธฅtp dj njswt"
# Quick mode (hide processing details)
python main.py "แธฅtp dj njswt" --no-detailsExample output:
======================================================================
โ
TRANSLATION COMPLETE
======================================================================
๐๏ธ Egyptian: แธฅtp dj njswt
๐ค Normalized: htp dj njswt
๐ฉ๐ช German: Ein Opfer, das der Kรถnig gibt.
๐ฌ๐ง English: A sacrifice given by the King.
======================================================================
from src.pipeline.rag_pipeline import RAGPipeline
# Initialize the translator
pipeline = RAGPipeline()
# Translate
result = pipeline.translate("แธฅtp dj njswt", show_details=False)
if result['success']:
print(f"English: {result['english']}")
print(f"German: {result['german']}")For a more user-friendly experience, launch the Gradio web UI:
python ui/app_gradio.pyAccess at: http://localhost:7860
Features:
๐น Egyptian keyboard - Click to type special characters ๐ Real-time translation - Instant results ๐ Retrieved examples - See which similar texts were used โ๏ธ Integrated setup - Run setup from the UI ๐ Example phrases - Try common Egyptian texts
Quick workflow:
Open UI in browser Enter text: แธฅtp dj njswt (type or use keyboard) Click "๐ Translate" View German & English translations Expand "Retrieved Examples" to see RAG context
See UI Guide for detailed instructions.
Our RAG system significantly outperforms direct LLM translation:
| Metric | RAG System | LLM-Only | Difference | Improvement |
|---|---|---|---|---|
| BLEU | 23.70% | 3.22% | +20.48% | +636% |
| ROUGE-1 | 53.93% | 22.08% | +31.85% | +144% |
| ROUGE-2 | 36.53% | 5.51% | +31.02% | +563% |
| ROUGE-L | 52.31% | 19.77% | +32.54% | +165% |
| METEOR | 39.32% | 12.83% | +26.49% | +206% |
| chrF | 45.35% | 17.34% | +28.01% | +162% |
| Exact Match | 9.89% | 0.00% | +9.89% | โ |
| Word Overlap | 43.36% | 18.43% | +24.93% | +135% |
Tested on 91 samples from the TLA dataset
- โ 20-32% higher accuracy across all metrics
- โ Contextual understanding from 9,000 reference translations
- โ Grammatical consistency through example matching
- โ No hallucinations - grounded in real expert translations
Edit .env to customize:
# Required
OLLAMA_API_KEY=your_key
# Optional (defaults shown)
LLM_MODEL=qwen3-vl:235b-instruct-cloud
EMBEDDING_MODEL=BAAI/bge-m3
TOP_K_RESULTS=30Uses the Thesaurus Linguae Aegyptiae (TLA) dataset:
- 9,000+ Earlier Egyptian texts
- Old Egyptian & Early Middle Egyptian periods
- Expert-curated translations
- Linguistic annotations (lemmas, POS tags, glossing)
Source: thesaurus-linguae-aegyptiae
Make sure you created a .env file with your API key.
Check your internet connection. The dataset is ~50MB.
This is normal - generating 9,000 embeddings takes ~30 minutes. It only runs once.
- Make sure
setup.pycompleted successfully - Try increasing
TOP_K_RESULTSin.env(default: 30) - Check that your Ollama API key is valid
- Email: yomnawaleed2023@gmail.com
- Documentation: Developer Guide
MIT License - see LICENSE file for details.
- TLA Dataset: Thesaurus Linguae Aegyptiae
- Embedding Model: BAAI/bge-m3
- Translation Model: Helsinki-NLP/opus-mt-de-en
- LLM: Ollama Cloud - Qwen 3 VL
Note: This is a research tool. For critical academic work, always verify translations with Egyptology experts.