Skip to content

Georgetown-IR-Lab/TARAZ

Repository files navigation

TARAZ

Project Structure

TARAZ/
├── data/                                    # Datasets
│   └── annotations/
│       └── Iran_data.json                   # Iran cultural knowledge dataset (500 Q&A pairs)
├── evaluation/                              # Evaluation system
│   ├── __init__.py                          # Module initialization
│   ├── evaluation_utils.py                 # Core evaluation utilities
│   ├── persian_evaluation.py               # Persian-specific evaluation logic
│   └── evaluation_results/                 # Evaluation reports
├── models/                                  # Model scripts and configs
│   ├── HooshvareLab-bert-fa-base-uncased/   # Persian BERT model scripts
│   ├── MaralGPT-Maral-7B-alpha-1/          # MaralGPT model scripts
│   └── ViraIntelligentDataMining-PersianLLaMA-13B/  # PersianLLaMA model scripts
├── model_cache/                             # Large model files (gitignored)
│   ├── HooshvareLab-bert-fa-base-uncased/   # BERT model weights
│   ├── MaralGPT-Maral-7B-alpha-1/          # MaralGPT model weights
│   └── ViraIntelligentDataMining-PersianLLaMA-13B/  # PersianLLaMA model weights
├── results/                                 # Processing results
│   └── annotations/
│       ├── HooshvareLab-bert-fa-base-uncased/  # BERT embeddings output (500 files)
│       ├── MaralGPT-Maral-7B-alpha-1/         # MaralGPT text responses output (500 files)
│       └── ViraIntelligentDataMining-PersianLLaMA-13B/  # PersianLLaMA text responses output (500 files)
├── requirements.txt
├── setup_external_deps.sh                  # External dependencies setup
├── utils.py                                # General utilities
└── README.md

Setup

  1. Create and activate virtual environment:

    python3 -m venv .venv
    source .venv/bin/activate
  2. Install dependencies:

    pip install -r requirements.txt
  3. Set up environment variables:

    cp .env.example .env

    Edit .env and fill in the appropriate values.

  4. Download models:

    # Download BERT model
    cd models/HooshvareLab-bert-fa-base-uncased
    python download_model.py
    
    # Download MaralGPT model
    cd ../MaralGPT-Maral-7B-alpha-1
    python download_model.py
    
    # Download PersianLLaMA model
    cd ../ViraIntelligentDataMining-PersianLLaMA-13B
    python download_model.py
  5. Run examples:

    # BERT examples
    cd models/HooshvareLab-bert-fa-base-uncased
    python example_usage.py
    
    # MaralGPT examples
    cd ../MaralGPT-Maral-7B-alpha-1
    python example_usage.py
    
    # PersianLLaMA examples
    cd ../ViraIntelligentDataMining-PersianLLaMA-13B
    python example_usage.py

Iran Dataset Processing

Process the Iran cultural knowledge dataset (500 Q&A pairs) with different models:

BERT Embeddings

Generate embeddings for both Farsi and English questions:

cd models/HooshvareLab-bert-fa-base-uncased
python process_iran_data.py --limit 5    # Test with 5 entries
python process_iran_data.py              # Process all 500 entries

MaralGPT Text Responses

Generate actual text responses to questions:

cd models/MaralGPT-Maral-7B-alpha-1
python process_iran_data.py --limit 5    # Test with 5 entries
python process_iran_data.py              # Process all 500 entries

PersianLLaMA Text Responses

Generate text responses using the 13B parameter PersianLLaMA model:

cd models/ViraIntelligentDataMining-PersianLLaMA-13B
python process_iran_data.py --limit 5    # Test with 5 entries
python process_iran_data.py              # Process all 500 entries

Output Structure

  • BERT results: results/annotations/HooshvareLab-bert-fa-base-uncased/
    • Individual JSON files with 768-dimensional embeddings
    • Both CLS token and mean pooled embeddings
  • MaralGPT results: results/annotations/MaralGPT-Maral-7B-alpha-1/
    • Individual JSON files with generated text responses (7B model)
    • Performance metrics and token counts
  • PersianLLaMA results: results/annotations/ViraIntelligentDataMining-PersianLLaMA-13B/
    • Individual JSON files with generated text responses (13B model)
    • Performance metrics and token counts

Evaluation System

BLend-compatible soft exact match evaluation for Persian cultural knowledge:

Method

  • Soft exact match: Lemmatization-based semantic matching
  • Persian lemmatization: Hazm library for root word extraction
  • English fallback: spaCy lemmatization when Persian fails
  • Weighted scoring: Human annotator confidence weighting
  • Binary + weighted metrics: Pass/fail and confidence-adjusted scores

Usage Example

cd models/MaralGPT-Maral-7B-alpha-1
python analyze_annotation_matches.py

Output

  • Location: evaluation/evaluation_results/
  • Format: JSON with detailed per-entry results
  • Metrics: Binary accuracy, weighted accuracy, match statistics

Notes

  • Model weights are stored in model_cache/ and excluded from version control
  • Scripts and configurations are kept in models/ for version tracking
  • Results are saved in structured JSON format with comprehensive metadata

About

BLEnD Sem Eval 7 for Persian Language

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors