TARAZ/
├── data/ # Datasets
│ └── annotations/
│ └── Iran_data.json # Iran cultural knowledge dataset (500 Q&A pairs)
├── evaluation/ # Evaluation system
│ ├── __init__.py # Module initialization
│ ├── evaluation_utils.py # Core evaluation utilities
│ ├── persian_evaluation.py # Persian-specific evaluation logic
│ └── evaluation_results/ # Evaluation reports
├── models/ # Model scripts and configs
│ ├── HooshvareLab-bert-fa-base-uncased/ # Persian BERT model scripts
│ ├── MaralGPT-Maral-7B-alpha-1/ # MaralGPT model scripts
│ └── ViraIntelligentDataMining-PersianLLaMA-13B/ # PersianLLaMA model scripts
├── model_cache/ # Large model files (gitignored)
│ ├── HooshvareLab-bert-fa-base-uncased/ # BERT model weights
│ ├── MaralGPT-Maral-7B-alpha-1/ # MaralGPT model weights
│ └── ViraIntelligentDataMining-PersianLLaMA-13B/ # PersianLLaMA model weights
├── results/ # Processing results
│ └── annotations/
│ ├── HooshvareLab-bert-fa-base-uncased/ # BERT embeddings output (500 files)
│ ├── MaralGPT-Maral-7B-alpha-1/ # MaralGPT text responses output (500 files)
│ └── ViraIntelligentDataMining-PersianLLaMA-13B/ # PersianLLaMA text responses output (500 files)
├── requirements.txt
├── setup_external_deps.sh # External dependencies setup
├── utils.py # General utilities
└── README.md
-
Create and activate virtual environment:
python3 -m venv .venv source .venv/bin/activate -
Install dependencies:
pip install -r requirements.txt
-
Set up environment variables:
cp .env.example .env
Edit
.envand fill in the appropriate values. -
Download models:
# Download BERT model cd models/HooshvareLab-bert-fa-base-uncased python download_model.py # Download MaralGPT model cd ../MaralGPT-Maral-7B-alpha-1 python download_model.py # Download PersianLLaMA model cd ../ViraIntelligentDataMining-PersianLLaMA-13B python download_model.py
-
Run examples:
# BERT examples cd models/HooshvareLab-bert-fa-base-uncased python example_usage.py # MaralGPT examples cd ../MaralGPT-Maral-7B-alpha-1 python example_usage.py # PersianLLaMA examples cd ../ViraIntelligentDataMining-PersianLLaMA-13B python example_usage.py
Process the Iran cultural knowledge dataset (500 Q&A pairs) with different models:
Generate embeddings for both Farsi and English questions:
cd models/HooshvareLab-bert-fa-base-uncased
python process_iran_data.py --limit 5 # Test with 5 entries
python process_iran_data.py # Process all 500 entriesGenerate actual text responses to questions:
cd models/MaralGPT-Maral-7B-alpha-1
python process_iran_data.py --limit 5 # Test with 5 entries
python process_iran_data.py # Process all 500 entriesGenerate text responses using the 13B parameter PersianLLaMA model:
cd models/ViraIntelligentDataMining-PersianLLaMA-13B
python process_iran_data.py --limit 5 # Test with 5 entries
python process_iran_data.py # Process all 500 entries- BERT results:
results/annotations/HooshvareLab-bert-fa-base-uncased/- Individual JSON files with 768-dimensional embeddings
- Both CLS token and mean pooled embeddings
- MaralGPT results:
results/annotations/MaralGPT-Maral-7B-alpha-1/- Individual JSON files with generated text responses (7B model)
- Performance metrics and token counts
- PersianLLaMA results:
results/annotations/ViraIntelligentDataMining-PersianLLaMA-13B/- Individual JSON files with generated text responses (13B model)
- Performance metrics and token counts
BLend-compatible soft exact match evaluation for Persian cultural knowledge:
- Soft exact match: Lemmatization-based semantic matching
- Persian lemmatization: Hazm library for root word extraction
- English fallback: spaCy lemmatization when Persian fails
- Weighted scoring: Human annotator confidence weighting
- Binary + weighted metrics: Pass/fail and confidence-adjusted scores
cd models/MaralGPT-Maral-7B-alpha-1
python analyze_annotation_matches.py- Location:
evaluation/evaluation_results/ - Format: JSON with detailed per-entry results
- Metrics: Binary accuracy, weighted accuracy, match statistics
- Model weights are stored in
model_cache/and excluded from version control - Scripts and configurations are kept in
models/for version tracking - Results are saved in structured JSON format with comprehensive metadata