NLGCorrEval

This repository contains code for Analyzing and Evaluating Correlation Measures in NLG Meta-Evaluation.

Requirements

The nlpstats library is essential for this project, and it needs to be installed locally. Some parts of the code have been modified (e.g., the addition of item-level correlation). Other necessary libraries include pandas, numpy, and others.

To set up the environment and install dependencies:

cd nlpstats
pip install --editable .
pip install numpy
pip install pandas

Example Usage

Data Setup

Download the dataset from this link, and place the data.csv file under the DP_RC directory.
For sensitivity to score granularity, download from this link, and place the data_all_rescaled.json file under the score_granularity directory.

1. Calculate Ranking Consistency

To calculate ranking consistency, you can use the following command. This will save the individual group results:

cd DP_RC
python ranking_consistency.py --input-file data.csv --output-file results.json --world-size 32 --number-trials 1000 --save-group-results

2. Calculate Discriminative Power Using Permutation Tests

To calculate the discriminative power using permutation tests, use the following command. It will also save the individual group results:

cd DP_RC
python discriminative_power_permutaion_test.py --input-file data.csv --output-file results.json --world-size 32 --number-trials 1000 --save-group-results

3. Re-sample Scores Generated by GPT-3.5/4/4o to Measure Sensitivity to Score Granularity

To measure the sensitivity to score granularity using the scores of GPT-3.5/4/4o, use this command. You can adjust the number of workers with the --num_workers argument (default is 4):

cd score_granularity
python re_sampling.py --data_type summarization --model GPT-3.5 --input_file data_all_rescaled.json --output_file gpt3.5_summ_result.csv --num_workers 4

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
DP_RC		DP_RC
nlpstats		nlpstats
score_granularity		score_granularity
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NLGCorrEval

Requirements

Example Usage

Data Setup

1. Calculate Ranking Consistency

2. Calculate Discriminative Power Using Permutation Tests

3. Re-sample Scores Generated by GPT-3.5/4/4o to Measure Sensitivity to Score Granularity

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

NLGCorrEval

Requirements

Example Usage

Data Setup

1. Calculate Ranking Consistency

2. Calculate Discriminative Power Using Permutation Tests

3. Re-sample Scores Generated by GPT-3.5/4/4o to Measure Sensitivity to Score Granularity

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages