CELESTA

CELESTA is a hybrid Entity Disambiguation (ED) framework designed for low-resource languages. In a case study on Indonesian, CELESTA performs parallel mention expansion using both multilingual and monolingual Large Language Models (LLMs). It then applies a similarity-based selection mechanism to choose the expansion that is most semantically aligned with the original context. Finally, the selected expansion is linked to a knowledge base entity using an off-the-shelf ED model—without requiring any fine-tuning. The following is the architecture of CELESTA:

📂 Repository Structure

├── datasets/                         # Input datasets (IndGEL, IndQEL, IndEL-WIKI)
├── ReFinED_format_datasets/          # Input datasets formatted for ReFinED
├── images/
│   └── celesta_architecture.jpg      # Architecture visualizations
│
├── src/                              # Source code for CELESTA modules
│   ├── mention_expansion            # Scripts for mention expansion
│   ├── mention_expansion_selection  # Scripts for mention expansion selection
│   ├── mention_expansion_implementation  # Apply mention expansion to sentences
│   └── entity_disambiguation/ 	     # Scripts for disambiguation process
│
├── mention_expansion_results/        # Mention expansion outputs from individual LLMs
│   └── IndGEL/                       # Results for IndGEL dataset	
│       └── few-shot/                 # Few-shot prompt results
│           └── mention_expansion_IndGEL_Llama-3.tsv # Example: raw expansion results from Llama-3 
│           └── mention_expansion_IndGEL_Llama-3_final.tsv # Example: finalized expansion results from Llama-3
│           └── mention_expansion_allLLMs_IndGEL.tsv # Example: Example: combination of all LLMs mention expansion results
├── with_mention_expansion/           # Test set sentences with mention expansions (3 datasets)
├── similarity_based_expansion_selection/ # Selected mention expansion using similarity measurement
│   └── IndGEL/                       # Results for IndGEL dataset
│       └── few-shot/                 # Few-shot prompt results
│           └── selected_expansion_with_scores_Llama-3_Komodo_few-shot_IndGEL.tsv
│                                      # Example: similarity-based selection results from
│                                      # Llama-3 and Komodo mention expansions
│           └── selected_expansion_Llama-3_Komodo_few-shot_IndGEL.tsv
│                                      # Example: final version of similarity-based selection results from
│                                      # Llama-3 and Komodo mention expansions
├── requirements.txt             	# Python dependencies for CELESTA
├── refined/ 				# Subdirectory for ReFinED setup
│   └── requirements.txt 		# Python dependencies for ReFinED
├── README.md                         # Project overview
└── LICENSE                           # License file

⚙️ Installation

Clone the repository

   
   git clone https://github.com/dice-group/CELESTA.git
   cd CELESTA

Create the environment


conda create -n celesta python=3.10
conda activate celesta
pip install -r requirements.txt

Install CELESTA-mGENRE


# change folder to entity_disambiguation directory
cd entity_disambiguation

# run script to install CELESTA-mGENRE
bash INSTALL-CELESTA-mGENRE.sh

Evaluation

📊 Datasets

CELESTA is evaluated on three Indonesian Entity Disambiguation (ED) datasets: IndGEL, IndQEL, and IndEL-WIKI.

IndGEL (general domain) and IndQEL (specific domain) are from the IndEL dataset.
IndEL-WIKI is a new dataset we created to provide additional evaluation data for CELESTA.

Dataset Property	IndGEL	IndQEL	IndEL-WIKI
Sentences	2,114	2,621	24,678
Total entities	4,765	2,453	24,678
Unique entities	55	16	24,678
Entities / sentence	2.4	1.6	1.0
Train set sentences	1,674	2,076	17,172
Validation set sentences	230	284	4,958
Test set sentences	230	284	4,958

🤖 Large Language Models (LLMs)

CELESTA uses two hybrid LLMs:

Multilingual LLMs
- LLaMA-3
- Mistral
Indonesian Monolingual LLMs
- Komodo
- Merak

🚀 Usage

Mention Expansion

Run mention expansion

# Change directory to the src folder
cd src

# Run the mention expansion script
mention_expansion.py [-h] [--model_name MODEL_NAME] [--prompt_type PROMPT_TYPE] [--dataset DATASET] [--split SPLIT] [--llm_name LLM_NAME] [--input_dir INPUT_DIR]
                            [--output_dir OUTPUT_DIR] [--batch_size BATCH_SIZE] [--save_every SAVE_EVERY] [--save_interval SAVE_INTERVAL]

Example: python mention_expansion.py --model_name meta-llama/Meta-Llama-3-70B-Instruct --prompt_type few-shot --dataset IndGEL --llm_name llama-3

Combine all LLM results into a single file.

# Change directory to the mention_expansion_results/{dataset}/{prompt_type} folder
cd ../mention_expansion_results/{dataset}/{prompt_type}

# Store the combined files in this folder
# Example: mention_expansion_allLLMs_IndGEL.tsv

Similarity-based mention expansion selection

Run mention expansion selection

# Change directory to the src folder
cd src

# Run the mention expansion selection script
mention_expansion_selection.py [-h] [--input_dir INPUT_DIR] [--output_dir OUTPUT_DIR] 
                    [--dataset DATASET] [--prompt_type PROMPT_TYPE] 
                    [--threshold THRESHOLD]

Example: python mention_expansion_selection.py --input_dir ../mention_expansion_results/ --output_dir ../similarity_based_expansion_selection/ --dataset IndGEL --prompt_type few-shot --threshold 0.80
Example results: selected_expansion_with_scores_Llama-3_Komodo_few-shot_IndGEL

Prepare selected mention expansion results for disambiguation process

# Keep necessary columns (sent_id, mention, sentence, best_expansion) from the results and remove the remaining ones
# Change column header best_expansion to mention expansion

Example results: selected_expansion_Llama-3_Komodo_few-shot_IndGEL

Selected Mention Expansion Implementation

Run mention expansion implementation

# Change directory to the src folder
cd src

# Run the mention expansion implementation script
mention_expansion_implementation.py [-h] [--prompt_type PROMPT_TYPE] [--dataset DATASET] [--llm1 LLM1_NAME] [--llm2 LLM2_NAME] [--expansion_base EXPANSION_BASE]
                            [--original_json_base ORIGINAL_JSON_BASE] [--output_base OUTPUT_BASE]

Example: python mention_expansion_implementation.py --prompt_type few-shot --dataset IndGEL --llm1 Llama-3 --llm2 Komodo

Entity Candidates and Final Entity Selection

Using ReFinED

# Create and activate a conda environment, e.g., "refined"
conda create -n refined python=3.10 -y
conda activate refined

# Install dependencies for ReFinED
pip install -r ../refined/requirements.txt

# Change directory to the CELESTA folder
cd ..

# Clone the repository
git clone https://github.com/amazon-science/ReFinED

# Change the directory to ReFinED/src/
cd ReFinED/src/

# Store refined_zero_shot_evaluation.py in the current directory. The file is in the CELESTA/src/entity_disambiguation/CELESTA-ReFinED folder. 

# Run the script
python refined_zero_shot_evaluation.py [-h] [--input_dir INPUT_DIR] [--dataset DATASET]
		     [--prompt_type PROMPT_TYPE] [--llm1 LLM1_NAME] [--llm2 LLM2_NAME]
		     [--ed_threshold ED_THRESHOLD]

Example: python refined_zero_shot_evaluation.py --input_dir ../../CELESTA/with_mention_expansion --dataset IndGEL --prompt_type few-shot --llm1 Llama-3 --llm2 Komodo --ed_threshold 0.15

Using mGENRE

# Run script to CELESTA-mGENRE
bash run-CELESTA-mGENRE.sh

📈 Results

General Performance

The table below compares CELESTA with two baseline ED models (ReFinED and mGENRE) across the three evaluation datasets. Bold values indicate the highest score for each metric within a dataset.

Dataset	Model	Precision	Recall	F1
IndGEL	ReFinED	0.749	0.547	0.633
	mGENRE	0.742	0.718	0.730
	CELESTA (ours)	0.748	0.722	0.735
IndQEL	ReFinED	0.208	0.160	0.181
	mGENRE	0.298	0.298	0.298
	CELESTA (ours)	0.298	0.298	0.298
IndEL-WIKI	ReFinED	0.627	0.327	0.430
	mGENRE	0.601	0.489	0.539
	CELESTA (ours)	0.595	0.495	0.540

The table below reports Precision (P), Recall (R), and F1 for CELESTA and individual LLM configurations across the three datasets, under both zero-shot and few-shot prompting. Bold values mark the highest F1 score within each dataset–prompting combination. Results are shown for CELESTA using ReFinED to generate candidate entities and retrieve the corresponding Wikidata URIs.

Dataset	Model	Zero-shot			Few-shot
Dataset	Model	P	R	F1	P	R	F1
IndGEL	LLaMA-3	0.727	0.499	0.592	0.777	0.531	0.631
	Mistral	0.699	0.411	0.517	0.806	0.310	0.448
	Komodo	0.709	0.447	0.548	0.704	0.527	0.603
	Merak	0.654	0.441	0.526	0.749	0.547	0.633
	CELESTA with ReFinED
	LLaMA-3 & Komodo	0.731	0.437	0.547	0.757	0.513	0.612
	LLaMA-3 & Merak	0.688	0.431	0.530	0.802	0.586	0.677
	Mistral & Komodo	0.719	0.390	0.506	0.781	0.344	0.478
	Mistral & Merak	0.678	0.402	0.505	0.779	0.503	0.611
IndQEL	LLaMA-3	0.154	0.051	0.077	0.327	0.058	0.099
	Mistral	0.179	0.131	0.151	0.072	0.029	0.042
	Komodo	0.158	0.116	0.134	0.208	0.160	0.181
	Merak	0.203	0.149	0.172	0.142	0.106	0.121
	CELESTA with ReFinED
	LLaMA-3 & Komodo	0.138	0.047	0.071	0.282	0.073	0.116
	LLaMA-3 & Merak	0.160	0.113	0.132	0.130	0.098	0.112
	Mistral & Komodo	0.138	0.095	0.112	0.107	0.047	0.066
	Mistral & Merak	0.196	0.146	0.167	0.128	0.095	0.109
IndEL-WIKI	LLaMA-3	0.581	0.234	0.332	0.639	0.322	0.428
	Mistral	0.565	0.232	0.329	0.552	0.201	0.294
	Komodo	0.592	0.256	0.357	0.591	0.270	0.370
	Merak	0.591	0.285	0.385	0.548	0.293	0.382
	CELESTA with ReFinED
	LLaMA-3 & Komodo	0.577	0.234	0.332	0.639	0.322	0.428
	LLaMA-3 & Merak	0.596	0.273	0.374	0.641	0.355	0.457
	Mistral & Komodo	0.576	0.231	0.330	0.575	0.219	0.317
	Mistral & Merak	0.564	0.248	0.345	0.581	0.270	0.369

These results show CELESTA’s performance when using mGENRE for candidate generation and Wikidata URI retrieval.

Dataset	Model	Zero-shot			Few-shot
Dataset	Model	P	R	F1	P	R	F1
IndGEL	LLaMA-3	0.720	0.694	0.707	0.742	0.718	0.730
	Mistral	0.667	0.640	0.653	0.607	0.584	0.595
	Komodo	0.702	0.668	0.685	0.740	0.698	0.718
	Merak	0.611	0.576	0.594	0.696	0.672	0.684
	CELESTA with mGENRE
	LLaMA-3 & Komodo	0.695	0.660	0.677	0.741	0.708	0.724
	LLaMA-3 & Merak	0.631	0.596	0.613	0.748	0.722	0.735
	Mistral & Komodo	0.657	0.632	0.644	0.623	0.602	0.612
	Mistral & Merak	0.620	0.588	0.603	0.702	0.676	0.686
IndQEL	LLaMA-3	0.298	0.298	0.298	0.274	0.273	0.273
	Mistral	0.258	0.258	0.258	0.185	0.182	0.183
	Komodo	0.252	0.251	0.251	0.269	0.269	0.269
	Merak	0.233	0.233	0.233	0.255	0.255	0.255
	CELESTA with mGENRE
	LLaMA-3 & Komodo	0.298	0.298	0.298	0.266	0.266	0.266
	LLaMA-3 & Merak	0.276	0.276	0.276	0.0.256	0.255	0.255
	Mistral & Komodo	0.262	0.262	0.262	0.185	0.182	0.183
	Mistral & Merak	0.236	0.236	0.236	0.202	0.200	0.201
IndEL-WIKI	LLaMA-3	0.516	0.415	0.460	0.601	0.489	0.539
	Mistral	0.457	0.360	0.403	0.447	0.363	0.401
	Komodo	0.542	0.401	0.461	0.547	0.422	0.476
	Merak	0.474	0.371	0.417	0.428	0.353	0.387
	CELESTA with mGENRE
	LLaMA-3 & Komodo	0.548	0.411	0.470	0.618	0.481	0.537
	LLaMA-3 & Merak	0.521	0.412	0.460	0.595	0.495	0.540
	Mistral & Komodo	0.500	0.368	0.424	0.484	0.382	0.427
	Mistral & Merak	0.447	0.349	0.392	0.507	0.413	0.455

Contribution of LLMs to CELESTA’s Correct Predictions

In addition to overall performance, we measure the contribution of each multilingual and monolingual LLM, as well as the original mention, to CELESTA’s correct predictions in the dual multilingual–monolingual mention expansion setup, using IndGEL with few-shot prompting. A contribution is counted when a mention expansion (or the original mention) is selected by CELESTA through its similarity-based selection mechanism and leads to a correct entity prediction. The table below reports contributions when CELESTA uses either ReFinED or mGENRE for candidate generation and Wikidata URI retrieval. Values indicate the percentage of correct predictions attributed to LLM1 (multilingual), LLM2 (monolingual), or the original mention for each LLM pair. These results highlight the complementary strengths of multilingual and monolingual LLMs and the benefit of pairing them with high-recall ED backends.

LLM Pair	LLM1 (%)	LLM2 (%)	Original (%)
CELESTA with ReFinED
LLaMA-3 & Komodo	64.49	12.32	23.19
LLaMA-3 & Merak	41.46	56.71	1.83
Mistral & Komodo	79.28	16.22	4.5
Mistral & Merak	43.24	56.76	0.0
CELESTA with mGENRE
LLaMA-3 & Komodo	59.83	14.53	25.64
LLaMA-3 & Merak	41.62	57.54	1.12
Mistral & Komodo	78.86	14.43	6.38
Mistral & Merak	46.87	53.43	0.0

📫 Contact

If you have any questions or feedbacks, feel free to contact us at [email protected] or [email protected]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

CELESTA

📂 Repository Structure

⚙️ Installation

Evaluation

📊 Datasets

🤖 Large Language Models (LLMs)

🚀 Usage

Mention Expansion

Similarity-based mention expansion selection

Selected Mention Expansion Implementation

Entity Candidates and Final Entity Selection

📈 Results

📫 Contact

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 93 Commits
ReFinED_format_datasets		ReFinED_format_datasets
datasets		datasets
images		images
mention_expansion_results/IndGEL/few-shot		mention_expansion_results/IndGEL/few-shot
refined		refined
similarity_based_expansion_selection		similarity_based_expansion_selection
src		src
with_mention_expansion		with_mention_expansion
README.md		README.md
requirements.txt		requirements.txt

dice-group/CELESTA

Folders and files

Latest commit

History

Repository files navigation

CELESTA

📂 Repository Structure

⚙️ Installation

Evaluation

📊 Datasets

🤖 Large Language Models (LLMs)

🚀 Usage

Mention Expansion

Similarity-based mention expansion selection

Selected Mention Expansion Implementation

Entity Candidates and Final Entity Selection

📈 Results

📫 Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages