Skip to content

JHU-CLSP/eval-the-eval-readability

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Evaluating the Evaluators

This repo cotains the code for Evaluating the Evaluators: Are readability metrics good measures of readability?

If you use this repo, please cite the following paper:

<INSERT BIBTEX>

Setup

$ conda create python==3.9.16 --name eval-readability
$ conda activate eval-readability
$ python setup.py clean install

Datasets

We use the following summarization datasets:

Our human annotated readability data is from August et al 2024.

Data format: The code expects the data in a text file, with each new line containing a summary.

For the HuggingFace datasets, use the following command to load and format the data:

$ python scripts/format_data.py \
--dataset_name <HF_DATASET_NAME> \
--subset <DATA_SUBSET> \
--split <DATA_SPLIT> \
--summary_col <COL_NAME_WITH_SUMMARIES> \
--outfile </PATH/TO/SAVE/FORMATTED/DATA>

** This is the official, released version of this dataset. We found multiple grammatical errors and re-collected the dataset for this paper. We are currently working with the original authors of the SJK paper to re-release the cleaned data.

Models

We use the following language models:

Running Model Inference Code

The following command prompts the model to rate the readability of the summaries in the input file:

$ python main.py \
<MODEL> \
</PATH/TO/INPUT_FILE> \
</PATH/TO/OUTPUT_FILE> \
-bsz <BATCH_SIZE>

The following command runs a script to extract the model ratings:

$ python scripts/get_rating.py </PATH/TO/OUTPUT_FILE>

Analysis

We release the results of our literature survey (Sec 4.1) here.

Jupyter notebooks with the analysis code can be found in analysis/.

  • analysis/model_analysis.ipynb contains the code for comparing the human judgements to traditional metrics and LM readability judgements. (Sec 4.2-4.3)
  • analysis/dataset_analysis.ipynb contains the code for the LM based evaluation for readability datasets. (Sec 4.4)

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published