Bench4KE

A Benchmarking System for Evaluating Knowledge Engineering Automation Tasks

Bench4KE is a benchmarking framework designed to evaluate KE automation with Large Language Models, currently focusing on the quality of Competency Questions automatically generated by Large Language Models.

CQs are natural language questions used by ontology engineers to define and validate the functional requirements of an ontology. With the increasing use of LLMs to automate tasks in Knowledge Engineering, the automatic generation of CQs is gaining attention. However, current evaluation approaches lack standardization and reproducibility.

Bench4KE addresses this gap by providing:

Key Features

A gold standard dataset derived from real-world ontology engineering projecs
Multiple evaluation metrics:
- Cosine Similarity
- BERTScore-F1
- Jaccard Similarity
- BLEU
- ROUGE-L
- Hit Rate
- LLM-based semantic analysis (via OpenAI's GPT-4 model)
Visual heatmaps for comparing generated and manually crafted CQs
Modular and extensible architecture to support the upload of a custom dataset, additional KE tasks and other evaluation metrics in the future

Directory Contents

File / Folder	Description
`restapi/`	Core directory containing the service and logic.
↳ `app/`	Contains the FastAPI application modules and related components.
↳ `benchmarkdataset.csv`	The gold standard dataset of manually crafted CQs used for evaluation.
↳ `tests/`	Directory for test cases and testing utilities.
↳ `tutorial/`	Tutorial materials to use the API.
↳ `bench4ke-validate-ui.py`	Web interface built on the API.
↳ `cq_generator_app.py`	Example of a CQ generation application compatible with the API.
`experimental-results/`	Collection of output data and evaluation results.

Usage

To evaluate a CQ Generation tool using Bench4KE, follow the steps below:

1. Setup

Ensure you have Python 3.8 or higher installed.

2. Install Dependencies

Download the required dependencies:

pip install -r requirements.txt

3. Run the Benchmark App

Launch the benchmarking application by running:

uvicorn app.main:app --reload --host 127.0.0.1 --port 8000

This will start a FastAPI-based service that guides you through the evaluation process.

3. Provide a CQ Generator Link

Once the app is running, you will be prompted to provide the URL of the CQ generation tool you want to evaluate. This tool should expose an endpoint that returns generated CQs for a given ontology or prompt.

4. Select Evaluation Data

You can evaluate your generator against:

The default gold standard dataset: benchmarkdataset.csv (provided in the repository)
A custom CSV file of manually crafted Competency Questions (structure should match benchmarkdataset.csv)

5. Review Results

Evaluation results will be saved in your terminal and/or served via the app's UI. Outputs include:

Similarity scores per CQ
Heatmaps for a visual comparison

Citation

@misc{bench4ke_2025,
  title        = {{Bench4KE}: A Benchmarking System for Evaluating LLM-based Competency Question Generation},
  howpublished = {\url{https://github.com/fossr-project/ontogenia-cini}},
  note         = {Commit accessed 29~Apr~2025},
  year         = {2025}
}

License

Licensed under the Apache License.

Name		Name	Last commit message	Last commit date
Latest commit History 89 Commits
.hypothesis/unicode_data/12.1.0		.hypothesis/unicode_data/12.1.0
.idea		.idea
experimental-results		experimental-results
restapi		restapi
LICENSE		LICENSE
README.md		README.md
benchmarklogo.png		benchmarklogo.png
contributionguidelines.md		contributionguidelines.md
dcat.ttl		dcat.ttl
maintenance.md		maintenance.md
requirements.txt		requirements.txt
test_combinations.py		test_combinations.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Bench4KE

A Benchmarking System for Evaluating Knowledge Engineering Automation Tasks

Key Features

Directory Contents

Usage

1. Setup

2. Install Dependencies

3. Run the Benchmark App

3. Provide a CQ Generator Link

4. Select Evaluation Data

5. Review Results

Citation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

License

fossr-project/ontogenia-cini

Folders and files

Latest commit

History

Repository files navigation

Bench4KE

A Benchmarking System for Evaluating Knowledge Engineering Automation Tasks

Key Features

Directory Contents

Usage

1. Setup

2. Install Dependencies

3. Run the Benchmark App

3. Provide a CQ Generator Link

4. Select Evaluation Data

5. Review Results

Citation

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages