Evaluating the Evaluators

This repo cotains the code for Evaluating the Evaluators: Are readability metrics good measures of readability?

If you use this repo, please cite the following paper:

<INSERT BIBTEX>

Setup

$ conda create python==3.9.16 --name eval-readability
$ conda activate eval-readability
$ python setup.py clean install

Datasets

We use the following summarization datasets:

Our human annotated readability data is from August et al 2024.

Data format: The code expects the data in a text file, with each new line containing a summary.

For the HuggingFace datasets, use the following command to load and format the data:

$ python scripts/format_data.py \
--dataset_name <HF_DATASET_NAME> \
--subset <DATA_SUBSET> \
--split <DATA_SPLIT> \
--summary_col <COL_NAME_WITH_SUMMARIES> \
--outfile </PATH/TO/SAVE/FORMATTED/DATA>

_{** This is the official, released version of this dataset. We found multiple grammatical errors and re-collected the dataset for this paper. We are currently working with the original authors of the SJK paper to re-release the cleaned data.}

Models

We use the following language models:

Running Model Inference Code

The following command prompts the model to rate the readability of the summaries in the input file:

$ python main.py \
<MODEL> \
</PATH/TO/INPUT_FILE> \
</PATH/TO/OUTPUT_FILE> \
-bsz <BATCH_SIZE>

The following command runs a script to extract the model ratings:

$ python scripts/get_rating.py </PATH/TO/OUTPUT_FILE>

Analysis

We release the results of our literature survey (Sec 4.1) here.

Jupyter notebooks with the analysis code can be found in analysis/.

analysis/model_analysis.ipynb contains the code for comparing the human judgements to traditional metrics and LM readability judgements. (Sec 4.2-4.3)
analysis/dataset_analysis.ipynb contains the code for the LM based evaluation for readability datasets. (Sec 4.4)

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
analysis		analysis
scripts		scripts
.gitignore		.gitignore
README.md		README.md
__init__.py		__init__.py
data.py		data.py
main.py		main.py
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Evaluating the Evaluators

Setup

Datasets

Models

Running Model Inference Code

Analysis

About

Uh oh!

Releases

Packages

Uh oh!

Languages

JHU-CLSP/eval-the-eval-readability

Folders and files

Latest commit

History

Repository files navigation

Evaluating the Evaluators

Setup

Datasets

Models

Running Model Inference Code

Analysis

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages