CELESTA is a hybrid Entity Disambiguation (ED) framework designed for low-resource languages. In a case study on Indonesian, CELESTA performs parallel mention expansion using both multilingual and monolingual Large Language Models (LLMs). It then applies a similarity-based selection mechanism to choose the expansion that is most semantically aligned with the original context. Finally, the selected expansion is linked to a knowledge base entity using an off-the-shelf ED modelβwithout requiring any fine-tuning. The following is the architecture of CELESTA:
βββ datasets/ # Input datasets (IndGEL, IndQEL, IndEL-WIKI)
βββ ReFinED_format_datasets/ # Input datasets formatted for ReFinED
βββ images/
β βββ celesta_architecture.jpg # Architecture visualizations
β
βββ src/ # Source code for CELESTA modules
β βββ mention_expansion # Scripts for mention expansion
β βββ mention_expansion_selection # Scripts for mention expansion selection
β βββ mention_expansion_implementation # Apply mention expansion to sentences
β βββ entity_disambiguation/ # Scripts for disambiguation process
β
βββ mention_expansion_results/ # Mention expansion outputs from individual LLMs
β βββ IndGEL/ # Results for IndGEL dataset
β βββ few-shot/ # Few-shot prompt results
β βββ mention_expansion_IndGEL_Llama-3.tsv # Example: raw expansion results from Llama-3
β βββ mention_expansion_IndGEL_Llama-3_final.tsv # Example: finalized expansion results from Llama-3
β βββ mention_expansion_allLLMs_IndGEL.tsv # Example: Example: combination of all LLMs mention expansion results
βββ with_mention_expansion/ # Test set sentences with mention expansions (3 datasets)
βββ similarity_based_expansion_selection/ # Selected mention expansion using similarity measurement
β βββ IndGEL/ # Results for IndGEL dataset
β βββ few-shot/ # Few-shot prompt results
β βββ selected_expansion_with_scores_Llama-3_Komodo_few-shot_IndGEL.tsv
β # Example: similarity-based selection results from
β # Llama-3 and Komodo mention expansions
β βββ selected_expansion_Llama-3_Komodo_few-shot_IndGEL.tsv
β # Example: final version of similarity-based selection results from
β # Llama-3 and Komodo mention expansions
βββ requirements.txt # Python dependencies for CELESTA
βββ refined/ # Subdirectory for ReFinED setup
β βββ requirements.txt # Python dependencies for ReFinED
βββ README.md # Project overview
βββ LICENSE # License file
- Clone the repository
git clone https://github.com/dice-group/CELESTA.git
cd CELESTA
- Create the environment
conda create -n celesta python=3.10
conda activate celesta
pip install -r requirements.txt
- Install CELESTA-mGENRE
# change folder to entity_disambiguation directory
cd entity_disambiguation
# run script to install CELESTA-mGENRE
bash INSTALL-CELESTA-mGENRE.sh
CELESTA is evaluated on three Indonesian Entity Disambiguation (ED) datasets: IndGEL, IndQEL, and IndEL-WIKI.
- IndGEL (general domain) and IndQEL (specific domain) are from the IndEL dataset.
- IndEL-WIKI is a new dataset we created to provide additional evaluation data for CELESTA.
Dataset Property | IndGEL | IndQEL | IndEL-WIKI |
---|---|---|---|
Sentences | 2,114 | 2,621 | 24,678 |
Total entities | 4,765 | 2,453 | 24,678 |
Unique entities | 55 | 16 | 24,678 |
Entities / sentence | 2.4 | 1.6 | 1.0 |
Train set sentences | 1,674 | 2,076 | 17,172 |
Validation set sentences | 230 | 284 | 4,958 |
Test set sentences | 230 | 284 | 4,958 |
CELESTA uses two hybrid LLMs:
- Run mention expansion
# Change directory to the src folder
cd src
# Run the mention expansion script
mention_expansion.py [-h] [--model_name MODEL_NAME] [--prompt_type PROMPT_TYPE] [--dataset DATASET] [--split SPLIT] [--llm_name LLM_NAME] [--input_dir INPUT_DIR]
[--output_dir OUTPUT_DIR] [--batch_size BATCH_SIZE] [--save_every SAVE_EVERY] [--save_interval SAVE_INTERVAL]
Example: python mention_expansion.py --model_name meta-llama/Meta-Llama-3-70B-Instruct --prompt_type few-shot --dataset IndGEL --llm_name llama-3
- Combine all LLM results into a single file.
# Change directory to the mention_expansion_results/{dataset}/{prompt_type} folder
cd ../mention_expansion_results/{dataset}/{prompt_type}
# Store the combined files in this folder
# Example: mention_expansion_allLLMs_IndGEL.tsv
- Run mention expansion selection
# Change directory to the src folder
cd src
# Run the mention expansion selection script
mention_expansion_selection.py [-h] [--input_dir INPUT_DIR] [--output_dir OUTPUT_DIR]
[--dataset DATASET] [--prompt_type PROMPT_TYPE]
[--threshold THRESHOLD]
Example: python mention_expansion_selection.py --input_dir ../mention_expansion_results/ --output_dir ../similarity_based_expansion_selection/ --dataset IndGEL --prompt_type few-shot --threshold 0.80
Example results: selected_expansion_with_scores_Llama-3_Komodo_few-shot_IndGEL
- Prepare selected mention expansion results for disambiguation process
# Keep necessary columns (sent_id, mention, sentence, best_expansion) from the results and remove the remaining ones
# Change column header best_expansion to mention expansion
Example results: selected_expansion_Llama-3_Komodo_few-shot_IndGEL
- Run mention expansion implementation
# Change directory to the src folder
cd src
# Run the mention expansion implementation script
mention_expansion_implementation.py [-h] [--prompt_type PROMPT_TYPE] [--dataset DATASET] [--llm1 LLM1_NAME] [--llm2 LLM2_NAME] [--expansion_base EXPANSION_BASE]
[--original_json_base ORIGINAL_JSON_BASE] [--output_base OUTPUT_BASE]
Example: python mention_expansion_implementation.py --prompt_type few-shot --dataset IndGEL --llm1 Llama-3 --llm2 Komodo
- Using ReFinED
# Create and activate a conda environment, e.g., "refined"
conda create -n refined python=3.10 -y
conda activate refined
# Install dependencies for ReFinED
pip install -r ../refined/requirements.txt
# Change directory to the CELESTA folder
cd ..
# Clone the repository
git clone https://github.com/amazon-science/ReFinED
# Change the directory to ReFinED/src/
cd ReFinED/src/
# Store refined_zero_shot_evaluation.py in the current directory. The file is in the CELESTA/src/entity_disambiguation/CELESTA-ReFinED folder.
# Run the script
python refined_zero_shot_evaluation.py [-h] [--input_dir INPUT_DIR] [--dataset DATASET]
[--prompt_type PROMPT_TYPE] [--llm1 LLM1_NAME] [--llm2 LLM2_NAME]
[--ed_threshold ED_THRESHOLD]
Example: python refined_zero_shot_evaluation.py --input_dir ../../CELESTA/with_mention_expansion --dataset IndGEL --prompt_type few-shot --llm1 Llama-3 --llm2 Komodo --ed_threshold 0.15
- Using mGENRE
# Run script to CELESTA-mGENRE
bash run-CELESTA-mGENRE.sh
- General Performance
The table below compares CELESTA with two baseline ED models (ReFinED and mGENRE) across the three evaluation datasets. Bold values indicate the highest score for each metric within a dataset.
Dataset | Model | Precision | Recall | F1 |
---|---|---|---|---|
IndGEL | ReFinED | 0.749 | 0.547 | 0.633 |
mGENRE | 0.742 | 0.718 | 0.730 | |
CELESTA (ours) | 0.748 | 0.722 | 0.735 | |
IndQEL | ReFinED | 0.208 | 0.160 | 0.181 |
mGENRE | 0.298 | 0.298 | 0.298 | |
CELESTA (ours) | 0.298 | 0.298 | 0.298 | |
IndEL-WIKI | ReFinED | 0.627 | 0.327 | 0.430 |
mGENRE | 0.601 | 0.489 | 0.539 | |
CELESTA (ours) | 0.595 | 0.495 | 0.540 |
The table below reports Precision (P), Recall (R), and F1 for CELESTA and individual LLM configurations across the three datasets, under both zero-shot and few-shot prompting. Bold values mark the highest F1 score within each datasetβprompting combination. Results are shown for CELESTA using ReFinED to generate candidate entities and retrieve the corresponding Wikidata URIs.
Dataset | Model | Zero-shot | Few-shot | ||||
---|---|---|---|---|---|---|---|
P | R | F1 | P | R | F1 | ||
IndGEL | LLaMA-3 | 0.727 | 0.499 | 0.592 | 0.777 | 0.531 | 0.631 |
Mistral | 0.699 | 0.411 | 0.517 | 0.806 | 0.310 | 0.448 | |
Komodo | 0.709 | 0.447 | 0.548 | 0.704 | 0.527 | 0.603 | |
Merak | 0.654 | 0.441 | 0.526 | 0.749 | 0.547 | 0.633 | |
CELESTA with ReFinED | |||||||
LLaMA-3 & Komodo | 0.731 | 0.437 | 0.547 | 0.757 | 0.513 | 0.612 | |
LLaMA-3 & Merak | 0.688 | 0.431 | 0.530 | 0.802 | 0.586 | 0.677 | |
Mistral & Komodo | 0.719 | 0.390 | 0.506 | 0.781 | 0.344 | 0.478 | |
Mistral & Merak | 0.678 | 0.402 | 0.505 | 0.779 | 0.503 | 0.611 | |
IndQEL | LLaMA-3 | 0.154 | 0.051 | 0.077 | 0.327 | 0.058 | 0.099 |
Mistral | 0.179 | 0.131 | 0.151 | 0.072 | 0.029 | 0.042 | |
Komodo | 0.158 | 0.116 | 0.134 | 0.208 | 0.160 | 0.181 | |
Merak | 0.203 | 0.149 | 0.172 | 0.142 | 0.106 | 0.121 | |
CELESTA with ReFinED | |||||||
LLaMA-3 & Komodo | 0.138 | 0.047 | 0.071 | 0.282 | 0.073 | 0.116 | |
LLaMA-3 & Merak | 0.160 | 0.113 | 0.132 | 0.130 | 0.098 | 0.112 | |
Mistral & Komodo | 0.138 | 0.095 | 0.112 | 0.107 | 0.047 | 0.066 | |
Mistral & Merak | 0.196 | 0.146 | 0.167 | 0.128 | 0.095 | 0.109 | |
IndEL-WIKI | LLaMA-3 | 0.581 | 0.234 | 0.332 | 0.639 | 0.322 | 0.428 |
Mistral | 0.565 | 0.232 | 0.329 | 0.552 | 0.201 | 0.294 | |
Komodo | 0.592 | 0.256 | 0.357 | 0.591 | 0.270 | 0.370 | |
Merak | 0.591 | 0.285 | 0.385 | 0.548 | 0.293 | 0.382 | |
CELESTA with ReFinED | |||||||
LLaMA-3 & Komodo | 0.577 | 0.234 | 0.332 | 0.639 | 0.322 | 0.428 | |
LLaMA-3 & Merak | 0.596 | 0.273 | 0.374 | 0.641 | 0.355 | 0.457 | |
Mistral & Komodo | 0.576 | 0.231 | 0.330 | 0.575 | 0.219 | 0.317 | |
Mistral & Merak | 0.564 | 0.248 | 0.345 | 0.581 | 0.270 | 0.369 |
These results show CELESTAβs performance when using mGENRE for candidate generation and Wikidata URI retrieval.
Dataset | Model | Zero-shot | Few-shot | ||||
---|---|---|---|---|---|---|---|
P | R | F1 | P | R | F1 | ||
IndGEL | LLaMA-3 | 0.720 | 0.694 | 0.707 | 0.742 | 0.718 | 0.730 |
Mistral | 0.667 | 0.640 | 0.653 | 0.607 | 0.584 | 0.595 | |
Komodo | 0.702 | 0.668 | 0.685 | 0.740 | 0.698 | 0.718 | |
Merak | 0.611 | 0.576 | 0.594 | 0.696 | 0.672 | 0.684 | |
CELESTA with mGENRE | |||||||
LLaMA-3 & Komodo | 0.695 | 0.660 | 0.677 | 0.741 | 0.708 | 0.724 | |
LLaMA-3 & Merak | 0.631 | 0.596 | 0.613 | 0.748 | 0.722 | 0.735 | |
Mistral & Komodo | 0.657 | 0.632 | 0.644 | 0.623 | 0.602 | 0.612 | |
Mistral & Merak | 0.620 | 0.588 | 0.603 | 0.702 | 0.676 | 0.686 | |
IndQEL | LLaMA-3 | 0.298 | 0.298 | 0.298 | 0.274 | 0.273 | 0.273 |
Mistral | 0.258 | 0.258 | 0.258 | 0.185 | 0.182 | 0.183 | |
Komodo | 0.252 | 0.251 | 0.251 | 0.269 | 0.269 | 0.269 | |
Merak | 0.233 | 0.233 | 0.233 | 0.255 | 0.255 | 0.255 | |
CELESTA with mGENRE | |||||||
LLaMA-3 & Komodo | 0.298 | 0.298 | 0.298 | 0.266 | 0.266 | 0.266 | |
LLaMA-3 & Merak | 0.276 | 0.276 | 0.276 | 0.0.256 | 0.255 | 0.255 | |
Mistral & Komodo | 0.262 | 0.262 | 0.262 | 0.185 | 0.182 | 0.183 | |
Mistral & Merak | 0.236 | 0.236 | 0.236 | 0.202 | 0.200 | 0.201 | |
IndEL-WIKI | LLaMA-3 | 0.516 | 0.415 | 0.460 | 0.601 | 0.489 | 0.539 |
Mistral | 0.457 | 0.360 | 0.403 | 0.447 | 0.363 | 0.401 | |
Komodo | 0.542 | 0.401 | 0.461 | 0.547 | 0.422 | 0.476 | |
Merak | 0.474 | 0.371 | 0.417 | 0.428 | 0.353 | 0.387 | |
CELESTA with mGENRE | |||||||
LLaMA-3 & Komodo | 0.548 | 0.411 | 0.470 | 0.618 | 0.481 | 0.537 | |
LLaMA-3 & Merak | 0.521 | 0.412 | 0.460 | 0.595 | 0.495 | 0.540 | |
Mistral & Komodo | 0.500 | 0.368 | 0.424 | 0.484 | 0.382 | 0.427 | |
Mistral & Merak | 0.447 | 0.349 | 0.392 | 0.507 | 0.413 | 0.455 |
- Contribution of LLMs to CELESTAβs Correct Predictions
In addition to overall performance, we measure the contribution of each multilingual and monolingual LLM, as well as the original mention, to CELESTAβs correct predictions in the dual multilingualβmonolingual mention expansion setup, using IndGEL with few-shot prompting. A contribution is counted when a mention expansion (or the original mention) is selected by CELESTA through its similarity-based selection mechanism and leads to a correct entity prediction. The table below reports contributions when CELESTA uses either ReFinED or mGENRE for candidate generation and Wikidata URI retrieval. Values indicate the percentage of correct predictions attributed to LLM1 (multilingual), LLM2 (monolingual), or the original mention for each LLM pair. These results highlight the complementary strengths of multilingual and monolingual LLMs and the benefit of pairing them with high-recall ED backends.
LLM Pair | LLM1 (%) | LLM2 (%) | Original (%) |
---|---|---|---|
CELESTA with ReFinED | |||
LLaMA-3 & Komodo | 64.49 | 12.32 | 23.19 |
LLaMA-3 & Merak | 41.46 | 56.71 | 1.83 |
Mistral & Komodo | 79.28 | 16.22 | 4.5 |
Mistral & Merak | 43.24 | 56.76 | 0.0 |
CELESTA with mGENRE | |||
LLaMA-3 & Komodo | 59.83 | 14.53 | 25.64 |
LLaMA-3 & Merak | 41.62 | 57.54 | 1.12 |
Mistral & Komodo | 78.86 | 14.43 | 6.38 |
Mistral & Merak | 46.87 | 53.43 | 0.0 |
If you have any questions or feedbacks, feel free to contact us at [email protected] or [email protected]