Name	Name	Last commit message	Last commit date
parent directory ..
api	api
data	data
error_analysis	error_analysis
figures	figures
output	output
source	source
.gitattributes	.gitattributes
LICENSE	LICENSE
README.md	README.md

OpenPI2.0

The original OpenPI dataset is one that trains and evaluates models to predict entity states throughout a procedure.

The goal of OpenPI2.0 is to augment in the following aspects:

Canonicalization: cluster entities and attributes with the same meaning;
Salience: add automatic and manual labels of entity salience.

With these features, OpenPI2.0 facilitates work on entity state tracking. It leads to fairer evaluation (by reducing false negatives during prediction) and better downstream performance (by allowing filtering by entity salience).

In this repo, we provide two resources:

An API that takes in procedure texts and outputs entities, attributes, states, clusters, cluster expansions, local salience, and global salience.
A dataset for development, evaluation, and tuning of models to predict the above.

API

cd api/

Format your input as trial.json.
Set your OpenAI API key to the path in the openai.api_key = line in predict_all.py, or change that line to your desired path.
Run openpi_api.py --input INPUT_PATH --output OUTPUT_PATH

Input: A list of procedures, each consisting of a goal and a list of steps. Output:

The predicted schema (entities and corresponding attributes) of each step
The predicted states based on the schema above of each step
The predicted global entity salience of the entire procedure
The predicted local entity salience of each step

Dataset

The all-in-one OpenPI2.0 dev data are:

Canonicalization

The resulting data file with entity and attribute clusters is data/dev-data-reformatted-v4.json for the development set. Those for the train and test sets are coming soon.

To create this data, we start with data/dev-ranked.json which is the original OpenPI data and perform canonicalization. See the README in source/cluster for more details on this.

The canonicalized OpenPI2.0 can evaluate entity state trcking more fairly. To get model predictions:

Running source/predict_schema.py --model MODEL --prompt 1|2 and predict_states.py --model MODEL produces predictions for the schemata subtask and the states subtask. The output is for example data/dev_schema_chatgpt_1.json. The prompt type 1 corresponds to predicting entities and attributes individually, while the prompt 2 corresponds to the combined prediction of an entire sentence (attribute of entity was pre-state before and post-state after) just like the original OpenPI evaluation.
Running source/evaluate_schema.py --model MODEL [--og] or similarly evaluate_states.py and evaluate_combined.py performs evaluation of the above settings. --og specifies to use the over-generated and expanded clusters for a fairer exact match evaluation.

Salience

We provide both human-annotated and LLM-predicted entity salience labels.

For human-annotated labels:

data/dev-data-reformatted-v4_votes_salience_1-20.json contains human annotations by human A
data/dev-data-reformatted-v4_votes_salience_1-20_human2.json contains human annotations by human B

For LLM-predicted labels:

data/dev-data-reformatted-v4_pred-salience.json contains LLM-predicted salience scores
This file is produced by running source/predict_salience.py --model MODEL

To evaluate the salience labels by correlation:

Running source/evaluate_salience.py calculates correlation among the above scores.
Running source/plot_correlation.py plots a bar chart of correlations for the first 20 procedures in the development set.

Citation

If you find our work helpful, please cite

@inproceedings{zhang-etal-2024-openpi2,
    title = "{O}pen{PI}2.0: An Improved Dataset for Entity Tracking in Texts",
    author = "Zhang, Li  and
      Xu, Hainiu  and
      Kommula, Abhinav  and
      Callison-Burch, Chris  and
      Tandon, Niket",
    editor = "Graham, Yvette  and
      Purver, Matthew",
    booktitle = "Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)",
    month = mar,
    year = "2024",
    address = "St. Julian{'}s, Malta",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2024.eacl-long.10",
    pages = "166--178",
    abstract = "Much texts describe a changing world (e.g., procedures, stories, newswires), and understanding them requires tracking how entities change. An earlier dataset, OpenPI, provided crowdsourced annotations of entity state changes in text. However, a major limitation was that those annotations were free-form and did not identify salient changes, hampering model evaluation. To overcome these limitations, we present an improved dataset, OpenPI2.0, where entities and attributes are fully canonicalized and additional entity salience annotations are added. On our fairer evaluation setting, we find that current state-of-the-art language models are far from competent. We also show that using state changes of salient entities as a chain-of-thought prompt, downstream performance is improved on tasks such as question answering and classical planning, outperforming the setting involving all related entities indiscriminately. We offer OpenPI2.0 for the continued development of models that can understand the dynamics of entities in text.",
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

OpenPI2.0

API

Dataset

Canonicalization

Salience

Citation

FilesExpand file tree

v2.0

Directory actions

More options

Directory actions

More options

Latest commit

History

v2.0

Folders and files

parent directory

README.md

OpenPI2.0

API

Dataset

Canonicalization

Salience

Citation