Skip to content

A hybrid LLM-based framework that performs parallel mention expansion using multilingual and monolingual LLMs, then applies a similarity-based mechanism to select the most contextually appropriate expansion. The selected expansion is used to improve entity disambiguation model in low-resource languages, such as Indonesian.

Notifications You must be signed in to change notification settings

dice-group/CELESTA

Repository files navigation

CELESTA

GitHub license GitHub stars

CELESTA is a hybrid Entity Disambiguation (ED) framework designed for low-resource languages. In a case study on Indonesian, CELESTA performs parallel mention expansion using both multilingual and monolingual Large Language Models (LLMs). It then applies a similarity-based selection mechanism to choose the expansion that is most semantically aligned with the original context. Finally, the selected expansion is linked to a knowledge base entity using an off-the-shelf ED modelβ€”without requiring any fine-tuning. The following is the architecture of CELESTA:

πŸ“‚ Repository Structure

β”œβ”€β”€ datasets/                         # Input datasets (IndGEL, IndQEL, IndEL-WIKI)
β”œβ”€β”€ ReFinED_format_datasets/          # Input datasets formatted for ReFinED
β”œβ”€β”€ images/
β”‚   └── celesta_architecture.jpg      # Architecture visualizations
β”‚
β”œβ”€β”€ src/                              # Source code for CELESTA modules
β”‚   β”œβ”€β”€ mention_expansion            # Scripts for mention expansion
β”‚   β”œβ”€β”€ mention_expansion_selection  # Scripts for mention expansion selection
β”‚   β”œβ”€β”€ mention_expansion_implementation  # Apply mention expansion to sentences
β”‚   └── entity_disambiguation/ 	     # Scripts for disambiguation process
β”‚
β”œβ”€β”€ mention_expansion_results/        # Mention expansion outputs from individual LLMs
β”‚   └── IndGEL/                       # Results for IndGEL dataset	
β”‚       └── few-shot/                 # Few-shot prompt results
β”‚           └── mention_expansion_IndGEL_Llama-3.tsv # Example: raw expansion results from Llama-3 
β”‚           └── mention_expansion_IndGEL_Llama-3_final.tsv # Example: finalized expansion results from Llama-3
β”‚           └── mention_expansion_allLLMs_IndGEL.tsv # Example: Example: combination of all LLMs mention expansion results
β”œβ”€β”€ with_mention_expansion/           # Test set sentences with mention expansions (3 datasets)
β”œβ”€β”€ similarity_based_expansion_selection/ # Selected mention expansion using similarity measurement
β”‚   └── IndGEL/                       # Results for IndGEL dataset
β”‚       └── few-shot/                 # Few-shot prompt results
β”‚           └── selected_expansion_with_scores_Llama-3_Komodo_few-shot_IndGEL.tsv
β”‚                                      # Example: similarity-based selection results from
β”‚                                      # Llama-3 and Komodo mention expansions
β”‚           └── selected_expansion_Llama-3_Komodo_few-shot_IndGEL.tsv
β”‚                                      # Example: final version of similarity-based selection results from
β”‚                                      # Llama-3 and Komodo mention expansions
β”œβ”€β”€ requirements.txt             	# Python dependencies for CELESTA
β”œβ”€β”€ refined/ 				# Subdirectory for ReFinED setup
β”‚   └── requirements.txt 		# Python dependencies for ReFinED
β”œβ”€β”€ README.md                         # Project overview
└── LICENSE                           # License file

βš™οΈ Installation

  1. Clone the repository
   
   git clone https://github.com/dice-group/CELESTA.git
   cd CELESTA 
  1. Create the environment

conda create -n celesta python=3.10
conda activate celesta
pip install -r requirements.txt

  1. Install CELESTA-mGENRE

# change folder to entity_disambiguation directory
cd entity_disambiguation

# run script to install CELESTA-mGENRE
bash INSTALL-CELESTA-mGENRE.sh

Evaluation

πŸ“Š Datasets

CELESTA is evaluated on three Indonesian Entity Disambiguation (ED) datasets: IndGEL, IndQEL, and IndEL-WIKI.

  • IndGEL (general domain) and IndQEL (specific domain) are from the IndEL dataset.
  • IndEL-WIKI is a new dataset we created to provide additional evaluation data for CELESTA.
Dataset Property IndGEL IndQEL IndEL-WIKI
Sentences 2,114 2,621 24,678
Total entities 4,765 2,453 24,678
Unique entities 55 16 24,678
Entities / sentence 2.4 1.6 1.0
Train set sentences 1,674 2,076 17,172
Validation set sentences 230 284 4,958
Test set sentences 230 284 4,958

πŸ€– Large Language Models (LLMs)

CELESTA uses two hybrid LLMs:

πŸš€ Usage

Mention Expansion

  1. Run mention expansion
# Change directory to the src folder
cd src

# Run the mention expansion script
mention_expansion.py [-h] [--model_name MODEL_NAME] [--prompt_type PROMPT_TYPE] [--dataset DATASET] [--split SPLIT] [--llm_name LLM_NAME] [--input_dir INPUT_DIR]
                            [--output_dir OUTPUT_DIR] [--batch_size BATCH_SIZE] [--save_every SAVE_EVERY] [--save_interval SAVE_INTERVAL]

Example: python mention_expansion.py --model_name meta-llama/Meta-Llama-3-70B-Instruct --prompt_type few-shot --dataset IndGEL --llm_name llama-3

  1. Combine all LLM results into a single file.
# Change directory to the mention_expansion_results/{dataset}/{prompt_type} folder
cd ../mention_expansion_results/{dataset}/{prompt_type}

# Store the combined files in this folder
# Example: mention_expansion_allLLMs_IndGEL.tsv

Similarity-based mention expansion selection

  1. Run mention expansion selection
# Change directory to the src folder
cd src

# Run the mention expansion selection script
mention_expansion_selection.py [-h] [--input_dir INPUT_DIR] [--output_dir OUTPUT_DIR] 
                    [--dataset DATASET] [--prompt_type PROMPT_TYPE] 
                    [--threshold THRESHOLD]

Example: python mention_expansion_selection.py --input_dir ../mention_expansion_results/ --output_dir ../similarity_based_expansion_selection/ --dataset IndGEL --prompt_type few-shot --threshold 0.80
Example results: selected_expansion_with_scores_Llama-3_Komodo_few-shot_IndGEL
  1. Prepare selected mention expansion results for disambiguation process
# Keep necessary columns (sent_id, mention, sentence, best_expansion) from the results and remove the remaining ones
# Change column header best_expansion to mention expansion

Example results: selected_expansion_Llama-3_Komodo_few-shot_IndGEL

Selected Mention Expansion Implementation

  1. Run mention expansion implementation
# Change directory to the src folder
cd src

# Run the mention expansion implementation script
mention_expansion_implementation.py [-h] [--prompt_type PROMPT_TYPE] [--dataset DATASET] [--llm1 LLM1_NAME] [--llm2 LLM2_NAME] [--expansion_base EXPANSION_BASE]
                            [--original_json_base ORIGINAL_JSON_BASE] [--output_base OUTPUT_BASE]

Example: python mention_expansion_implementation.py --prompt_type few-shot --dataset IndGEL --llm1 Llama-3 --llm2 Komodo

Entity Candidates and Final Entity Selection

  1. Using ReFinED
# Create and activate a conda environment, e.g., "refined"
conda create -n refined python=3.10 -y
conda activate refined

# Install dependencies for ReFinED
pip install -r ../refined/requirements.txt

# Change directory to the CELESTA folder
cd ..

# Clone the repository
git clone https://github.com/amazon-science/ReFinED

# Change the directory to ReFinED/src/
cd ReFinED/src/

# Store refined_zero_shot_evaluation.py in the current directory. The file is in the CELESTA/src/entity_disambiguation/CELESTA-ReFinED folder. 

# Run the script
python refined_zero_shot_evaluation.py [-h] [--input_dir INPUT_DIR] [--dataset DATASET]
		     [--prompt_type PROMPT_TYPE] [--llm1 LLM1_NAME] [--llm2 LLM2_NAME]
		     [--ed_threshold ED_THRESHOLD]

Example: python refined_zero_shot_evaluation.py --input_dir ../../CELESTA/with_mention_expansion --dataset IndGEL --prompt_type few-shot --llm1 Llama-3 --llm2 Komodo --ed_threshold 0.15
  1. Using mGENRE
# Run script to CELESTA-mGENRE
bash run-CELESTA-mGENRE.sh

πŸ“ˆ Results

  1. General Performance

The table below compares CELESTA with two baseline ED models (ReFinED and mGENRE) across the three evaluation datasets. Bold values indicate the highest score for each metric within a dataset.

Dataset Model Precision Recall F1
IndGEL ReFinED 0.749 0.547 0.633
mGENRE 0.742 0.718 0.730
CELESTA (ours) 0.748 0.722 0.735
IndQEL ReFinED 0.208 0.160 0.181
mGENRE 0.298 0.298 0.298
CELESTA (ours) 0.298 0.298 0.298
IndEL-WIKI ReFinED 0.627 0.327 0.430
mGENRE 0.601 0.489 0.539
CELESTA (ours) 0.595 0.495 0.540

The table below reports Precision (P), Recall (R), and F1 for CELESTA and individual LLM configurations across the three datasets, under both zero-shot and few-shot prompting. Bold values mark the highest F1 score within each dataset–prompting combination. Results are shown for CELESTA using ReFinED to generate candidate entities and retrieve the corresponding Wikidata URIs.

Dataset Model Zero-shot Few-shot
PRF1 PRF1
IndGEL LLaMA-30.7270.4990.5920.7770.5310.631
Mistral0.6990.4110.5170.8060.3100.448
Komodo0.7090.4470.5480.7040.5270.603
Merak0.6540.4410.5260.7490.5470.633
CELESTA with ReFinED
LLaMA-3 & Komodo0.7310.4370.5470.7570.5130.612
LLaMA-3 & Merak0.6880.4310.5300.8020.5860.677
Mistral & Komodo0.7190.3900.5060.7810.3440.478
Mistral & Merak0.6780.4020.5050.7790.5030.611
IndQEL LLaMA-30.1540.0510.0770.3270.0580.099
Mistral0.1790.1310.1510.0720.0290.042
Komodo0.1580.1160.1340.2080.1600.181
Merak0.2030.1490.1720.1420.1060.121
CELESTA with ReFinED
LLaMA-3 & Komodo0.1380.0470.0710.2820.0730.116
LLaMA-3 & Merak0.1600.1130.1320.1300.0980.112
Mistral & Komodo0.1380.0950.1120.1070.0470.066
Mistral & Merak0.1960.1460.1670.1280.0950.109
IndEL-WIKI LLaMA-30.5810.2340.3320.6390.3220.428
Mistral0.5650.2320.3290.5520.2010.294
Komodo0.5920.2560.3570.5910.2700.370
Merak0.5910.2850.3850.5480.2930.382
CELESTA with ReFinED
LLaMA-3 & Komodo0.5770.2340.3320.6390.3220.428
LLaMA-3 & Merak0.5960.2730.3740.6410.3550.457
Mistral & Komodo0.5760.2310.3300.5750.2190.317
Mistral & Merak0.5640.2480.3450.5810.2700.369

These results show CELESTA’s performance when using mGENRE for candidate generation and Wikidata URI retrieval.

Dataset Model Zero-shot Few-shot
PRF1 PRF1
IndGEL LLaMA-30.7200.6940.7070.7420.7180.730
Mistral0.6670.6400.6530.6070.5840.595
Komodo0.7020.6680.6850.7400.6980.718
Merak0.6110.5760.5940.6960.6720.684
CELESTA with mGENRE
LLaMA-3 & Komodo0.6950.6600.6770.7410.7080.724
LLaMA-3 & Merak0.6310.5960.6130.7480.7220.735
Mistral & Komodo0.6570.6320.6440.6230.6020.612
Mistral & Merak0.6200.5880.6030.7020.6760.686
IndQEL LLaMA-30.2980.2980.2980.2740.2730.273
Mistral0.2580.2580.2580.1850.1820.183
Komodo0.2520.2510.2510.2690.2690.269
Merak0.2330.2330.2330.2550.2550.255
CELESTA with mGENRE
LLaMA-3 & Komodo0.2980.2980.2980.2660.2660.266
LLaMA-3 & Merak0.2760.2760.2760.0.2560.2550.255
Mistral & Komodo0.2620.2620.2620.1850.1820.183
Mistral & Merak0.2360.2360.2360.2020.2000.201
IndEL-WIKI LLaMA-30.5160.4150.4600.6010.4890.539
Mistral0.4570.3600.4030.4470.3630.401
Komodo0.5420.4010.4610.5470.4220.476
Merak0.4740.3710.4170.4280.3530.387
CELESTA with mGENRE
LLaMA-3 & Komodo0.5480.4110.4700.6180.4810.537
LLaMA-3 & Merak0.5210.4120.4600.5950.4950.540
Mistral & Komodo0.5000.3680.4240.4840.3820.427
Mistral & Merak0.4470.3490.3920.5070.4130.455
  1. Contribution of LLMs to CELESTA’s Correct Predictions

In addition to overall performance, we measure the contribution of each multilingual and monolingual LLM, as well as the original mention, to CELESTA’s correct predictions in the dual multilingual–monolingual mention expansion setup, using IndGEL with few-shot prompting. A contribution is counted when a mention expansion (or the original mention) is selected by CELESTA through its similarity-based selection mechanism and leads to a correct entity prediction. The table below reports contributions when CELESTA uses either ReFinED or mGENRE for candidate generation and Wikidata URI retrieval. Values indicate the percentage of correct predictions attributed to LLM1 (multilingual), LLM2 (monolingual), or the original mention for each LLM pair. These results highlight the complementary strengths of multilingual and monolingual LLMs and the benefit of pairing them with high-recall ED backends.

LLM Pair LLM1 (%) LLM2 (%) Original (%)
CELESTA with ReFinED
LLaMA-3 & Komodo 64.49 12.32 23.19
LLaMA-3 & Merak 41.46 56.71 1.83
Mistral & Komodo 79.28 16.22 4.5
Mistral & Merak 43.24 56.76 0.0
CELESTA with mGENRE
LLaMA-3 & Komodo 59.83 14.53 25.64
LLaMA-3 & Merak 41.62 57.54 1.12
Mistral & Komodo 78.86 14.43 6.38
Mistral & Merak 46.87 53.43 0.0

πŸ“« Contact

If you have any questions or feedbacks, feel free to contact us at [email protected] or [email protected]

About

A hybrid LLM-based framework that performs parallel mention expansion using multilingual and monolingual LLMs, then applies a similarity-based mechanism to select the most contextually appropriate expansion. The selected expansion is used to improve entity disambiguation model in low-resource languages, such as Indonesian.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published