🏛️ Egyptian RAG Translator

Translate Earlier Egyptian transliterations to English using state-of-the-art AI and Retrieval-Augmented Generation (RAG).

📖 What is This?

This tool translates Ancient Egyptian transliterations (like ḥtp dj njswt) into English through a sophisticated AI pipeline:

Normalizes the Egyptian text
Searches a database of 9,000 expert translations for similar examples
Translates to German using a large language model with context
Converts the German to English

Example:

Input:  ḥtp dj njswt
Output: A sacrifice given by the King.

⚡ Quick Start

Prerequisites

Python 3.8 or higher
An Ollama API key (Get one here)
5GB free disk space

Ollama Model Setup (Required)

This project uses Ollama Cloud for LLM-based translation. Before running the system, you must download and enable the required model.

Step 1: Install Ollama

Download and install Ollama from: https://ollama.com

Verify installation:

ollama --version

Step 2: Pull the Required Model

Run the following command to download the model:

ollama pull qwen3-vl:235b-instruct-cloud

Installation

Clone the repository:

git clone https://github.com/yourusername/egyptian-rag-translator.git
cd egyptian-rag-translator

Create virtual environment:

# Using uv (recommended - faster)
uv init
uv venv

# OR using standard Python
python -m venv .venv

Activate environment:

Windows:

.venv\Scripts\activate

Linux/Mac:

source .venv/bin/activate

Install dependencies:

# Using uv (faster)
uv pip install -r requirements.txt

# OR using pip
pip install -r requirements.txt

Configure API key:

Create a .env file in the project root:

OLLAMA_API_KEY=your_api_key_here

Setup the system (one command):

python setup.py

This will automatically:

Download the Egyptian dataset (~9,000 texts)
Process and clean the data
Generate AI embeddings (~30 minutes)
Build the search database

Note: The setup script is smart - it won't re-download or re-process if files already exist.

🚀 Usage

Command Line

# Basic translation
python main.py "ḥtp dj njswt"

# Quick mode (hide processing details)
python main.py "ḥtp dj njswt" --no-details

Example output:

======================================================================
✅ TRANSLATION COMPLETE
======================================================================
🏛️ Egyptian:  ḥtp dj njswt
🔤 Normalized: htp dj njswt
🇩🇪 German:    Ein Opfer, das der König gibt.
🇬🇧 English:   A sacrifice given by the King.
======================================================================

Python API

from src.pipeline.rag_pipeline import RAGPipeline

# Initialize the translator
pipeline = RAGPipeline()

# Translate
result = pipeline.translate("ḥtp dj njswt", show_details=False)

if result['success']:
    print(f"English: {result['english']}")
    print(f"German:  {result['german']}")

Web User Interface

For a more user-friendly experience, launch the Gradio web UI:

python ui/app_gradio.py

Access at: http://localhost:7860

Features:

🎹 Egyptian keyboard - Click to type special characters 🔄 Real-time translation - Instant results 🔍 Retrieved examples - See which similar texts were used ⚙️ Integrated setup - Run setup from the UI 📖 Example phrases - Try common Egyptian texts

Quick workflow:

Open UI in browser Enter text: ḥtp dj njswt (type or use keyboard) Click "🔄 Translate" View German & English translations Expand "Retrieved Examples" to see RAG context

See UI Guide for detailed instructions.

📊 Performance

Our RAG system significantly outperforms direct LLM translation:

Metric	RAG System	LLM-Only	Difference	Improvement
BLEU	23.70%	3.22%	+20.48%	+636%
ROUGE-1	53.93%	22.08%	+31.85%	+144%
ROUGE-2	36.53%	5.51%	+31.02%	+563%
ROUGE-L	52.31%	19.77%	+32.54%	+165%
METEOR	39.32%	12.83%	+26.49%	+206%
chrF	45.35%	17.34%	+28.01%	+162%
Exact Match	9.89%	0.00%	+9.89%	∞
Word Overlap	43.36%	18.43%	+24.93%	+135%

Tested on 91 samples from the TLA dataset

Why RAG is Better

✅ 20-32% higher accuracy across all metrics
✅ Contextual understanding from 9,000 reference translations
✅ Grammatical consistency through example matching
✅ No hallucinations - grounded in real expert translations

🔧 Configuration

Edit .env to customize:

# Required
OLLAMA_API_KEY=your_key

# Optional (defaults shown)
LLM_MODEL=qwen3-vl:235b-instruct-cloud
EMBEDDING_MODEL=BAAI/bge-m3
TOP_K_RESULTS=30

📚 Dataset

Uses the Thesaurus Linguae Aegyptiae (TLA) dataset:

9,000+ Earlier Egyptian texts
Old Egyptian & Early Middle Egyptian periods
Expert-curated translations
Linguistic annotations (lemmas, POS tags, glossing)

Source: thesaurus-linguae-aegyptiae

❓ Troubleshooting

"OLLAMA_API_KEY not found"

Make sure you created a .env file with your API key.

"Dataset download failed"

Check your internet connection. The dataset is ~50MB.

"Embedding generation is slow"

This is normal - generating 9,000 embeddings takes ~30 minutes. It only runs once.

"Translation quality is poor"

Make sure setup.py completed successfully
Try increasing TOP_K_RESULTS in .env (default: 30)
Check that your Ollama API key is valid

🆘 Support

Email: yomnawaleed2023@gmail.com
Documentation: Developer Guide

📄 License

MIT License - see LICENSE file for details.

🙏 Acknowledgments

TLA Dataset: Thesaurus Linguae Aegyptiae
Embedding Model: BAAI/bge-m3
Translation Model: Helsinki-NLP/opus-mt-de-en
LLM: Ollama Cloud - Qwen 3 VL

Note: This is a research tool. For critical academic work, always verify translations with Egyptology experts.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
notebooks		notebooks
scripts		scripts
src		src
ui		ui
.env.example		.env.example
.gitignore		.gitignore
.python-version		.python-version
DEVELOPER.md		DEVELOPER.md
README.md		README.md
main.py		main.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🏛️ Egyptian RAG Translator

📖 What is This?

⚡ Quick Start

Prerequisites

Ollama Model Setup (Required)

Installation

🚀 Usage

Command Line

Python API

Web User Interface

📊 Performance

Why RAG is Better

🔧 Configuration

📚 Dataset

❓ Troubleshooting

"OLLAMA_API_KEY not found"

"Dataset download failed"

"Embedding generation is slow"

"Translation quality is poor"

🆘 Support

📄 License

🙏 Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🏛️ Egyptian RAG Translator

📖 What is This?

⚡ Quick Start

Prerequisites

Ollama Model Setup (Required)

Installation

🚀 Usage

Command Line

Python API

Web User Interface

📊 Performance

Why RAG is Better

🔧 Configuration

📚 Dataset

❓ Troubleshooting

"OLLAMA_API_KEY not found"

"Dataset download failed"

"Embedding generation is slow"

"Translation quality is poor"

🆘 Support

📄 License

🙏 Acknowledgments

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages