Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 6 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,16 +3,16 @@
The [Annual Meeting](http://www.trb.org/AnnualMeeting/AnnualMeeting.aspx) of the [Transportation Research Board (TRB)](http://www.trb.org/Main/Home.aspx) is attended by over 10,000 participants. The core feature of the meeting are sessions devoted to the presentation of research. Research papers are submitted to TRB and assigned to TRB committees. The TRB committees are comprised of volunteers from academia and industry. These committees must review research papers and curate worthy entries into TRB sessions during the narrow time window from the paper deadline on August 31st to the posting of the preliminary Annual Meeting agenda in early December. For committees that receive large numbers of papers, this is a difficult task. The purpose of Chandra Bot is to use data and analysis to make the review process more efficient, effective, and fair. The project name is an homage to [Professor Chandra Bhat](http://www.caee.utexas.edu/prof/bhat/home.html) of the University of Texas -- the idea being that if we could only clone Professor Bhat and have him review each paper, the review process would be perfect.

## Data Model
In order to organize our thinking and structure our code, we started with a [data model](/chandra_bot/chandra_bot_data_model.proto). It includes:
In order to organize our thinking and structure our code, we started with a [data model](/chandra_bot/data_model_pydantic.py). It includes:
* Humans -- humans write and review papers;
* Papers -- research articles submitted to the Annual Meeting;
* Reviews -- reviews of submitted papers; and,
* Numerous other supporting data types and relationships.

The data model is realized as a [Protocol Buffer](https://developers.google.com/protocol-buffers), which provides an abstraction between the model and the underlying software implementation.
The data model is built ontop of [Pydantic](https://docs.pydantic.dev/latest/) which performs data validation and serialization.

## Prototype Software
Prototype software is created to efficiently explore the underlying data. It allows for any number of easy examinations. For example, say Reviewer A only uses a portion of the one to five scale used to rate TRB papers, giving each paper a score of 3, 4, or 5. Reviewer B similarly uses a portion of the scale, giving each paper a score of 1, 2, or 3. When a TRB committee receives scores from Reviewer A and Reviewer B, would it not be more efficient, effective, and fair if the committee could easily normalize these scores to each reviewers internal scoring system? The proposition puct forward here is that a useful data model paired with software is the first step in implementing committees with such tooling. A relatively unique feature of the TRB Annual Meeting is that a relatively small number of reviewers review papers from a relatively small number of authors every year. This allows the opportunity to find patterns and extract information from a time series of data that can be made useful in a relatively short period of time.
Prototype software is created to efficiently explore the underlying data. It allows for any number of easy examinations. For example, say Reviewer A only uses a portion of the one to five scale used to rate TRB papers, giving each paper a score of 3, 4, or 5. Reviewer B similarly uses a portion of the scale, giving each paper a score of 1, 2, or 3. When a TRB committee receives scores from Reviewer A and Reviewer B, would it not be more efficient, effective, and fair if the committee could easily normalize these scores to each reviewers internal scoring system? The proposition put forward here is that a useful data model paired with software is the first step in implementing committees with such tooling. A relatively unique feature of the TRB Annual Meeting is that a relatively small number of reviewers review papers from a relatively small number of authors every year. This allows the opportunity to find patterns and extract information from a time series of data that can be made useful in a relatively short period of time.

## Fake Data
The reviews of academic papers contain sensitive information. To facilitate testing and exploration of the Project's tools, we have created a time series of fake data (see the `/examples` directory) based on open databases of names, affiliations, and sentences. Any resemblence to real people or reviews is unintentional.
Expand All @@ -22,8 +22,9 @@ One challenge in assembling the data that powers potential analysis is that ther

## Contributing Authors
The Project is being led by TRB's Standing Committee on Travel Demand Forecasting. [David Ory]([email protected]) is the current paper review chair of this committee and is responsible for the Project. Other team members contributing to the project are:
* [Sijia Wang](https://github.com/i-am-sijia); and,
* [Gayathri Shivaraman](https://github.com/gshivaraman).
* [Sijia Wang](https://github.com/i-am-sijia);
* [Gayathri Shivaraman](https://github.com/gshivaraman); and,
* [David Hensle](https://github.com/dhensle);

## License
[Apache 2.0](LICENSE.txt)
73 changes: 50 additions & 23 deletions chandra_bot/chandra_bot.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,10 @@

import numpy as np
import pandas as pd
import json

from . import data_model_pb2 as dm
# from . import data_model_pb2 as dm
from . import data_model_pydantic as dm


class ChandraBot(object):
Expand Down Expand Up @@ -102,7 +104,7 @@ def __init__(
self.review_df: pd.DataFrame = review_df
self.human_df: pd.DataFrame = human_df

self.paper_book = dm.PaperBook()
self.paper_book = dm.PaperBook(paper=[])
else:
self.paper_book: dm.PaperBook = input_paper_book

Expand All @@ -112,54 +114,61 @@ def _attribute_paper(self, paper: dm.Paper, row: list) -> None:
paper.year = int(row["year"])

if row["committee_presentation_decision"].lower() == "reject":
paper.committee_presentation_decision = dm.PRESENTATION_REC_REJECT
paper.committee_presentation_decision = dm.PresentationRecEnum.PRESENTATION_REC_REJECT
elif row["committee_presentation_decision"].lower() == "accept":
paper.committee_presentation_decision = dm.PRESENTATION_REC_ACCEPT
paper.committee_presentation_decision = dm.PresentationRecEnum.PRESENTATION_REC_ACCEPT
else:
paper.committee_presentation_decision = dm.PRESENTATION_REC_NONE
paper.committee_presentation_decision = dm.PresentationRecEnum.PRESENTATION_REC_NONE

if row["committee_publication_decision"].lower() == "reject":
paper.committee_publication_decision = dm.PUBLICATION_REC_REJECT
paper.committee_publication_decision = dm.PublicationRecEnum.PUBLICATION_REC_REJECT
elif row["committee_publication_decision"].lower() == "accept":
paper.committee_publication_decision = dm.PUBLICATION_REC_ACCEPT
paper.committee_publication_decision = dm.PublicationRecEnum.PUBLICATION_REC_ACCEPT
elif row["committee_publication_decision"].lower() == "accept_correct":
paper.committee_publication_decision = dm.PUBLICATION_REC_ACCEPT_CORRECT
paper.committee_publication_decision = dm.PublicationRecEnum.PUBLICATION_REC_ACCEPT_CORRECT
else:
paper.committee_publication_decision = dm.PUBLICATION_REC_NONE
paper.committee_publication_decision = dm.PublicationRecEnum.PUBLICATION_REC_NONE

paper.abstract = dm.Content.model_construct()
if "abstract" in row:
paper.abstract.text = row["abstract"]
else:
paper.abstract.text = "Missing"

paper.body = dm.Content.model_construct()
if "body" in row:
paper.body.text = str(row["body"])
else:
paper.body.text = "Missing"

def _attribute_author(self, author: dm.Author, row: list):
author.human = dm.Human.model_construct()
author.human.name = row["name"].values[0]

if not pd.isnull(row["aliases"].values[0]):
for alias in row["aliases"].values[0].split(","):
author.human.aliases.append(alias)

author.human.hash_id = row["hash_id"].values[0]
author.human.current_affiliation = dm.Affiliation.model_construct()
if not pd.isnull(row["current_affiliation"].values[0]):
author.human.current_affiliation.name = row["current_affiliation"].values[0]
else:
author.human.current_affiliation.name = ""

author.human.last_degree_affiliation = dm.Affiliation.model_construct()
author.human.last_degree_affiliation.name = str(
row["last_degree_affiliation"].values[0]
)

author.human.previous_affiliation = []
if not pd.isnull(row["previous_affiliation"].values[0]):
affil_list = row["previous_affiliation"].values[0].split(",")
if len(affil_list) > 0:
for affil in affil_list:
affiliation = author.human.previous_affiliation.add()
affiliation = dm.Affiliation.model_construct()
affiliation.name = affil
author.human.previous_affiliation.append(affiliation)

if not pd.isnull(row["orcid_url"].values[0]):
author.human.orcid_url = str(row["orcid_url"].values[0])
Expand All @@ -174,35 +183,39 @@ def _attribute_author(self, author: dm.Author, row: list):
def _attribute_review(self, review: dm.Review, row: list):
review.presentation_score = row["presentation_score"]

review.commentary_to_author = dm.Content.model_construct()
if not pd.isnull(row["commentary_to_author"]):
review.commentary_to_author.text = row["commentary_to_author"]
else:
review.commentary_to_author.text = ""

review.commentary_to_chair = dm.Content.model_construct()
if not pd.isnull(row["commentary_to_chair"]):
review.commentary_to_chair.text = row["commentary_to_chair"]
else:
review.commentary_to_chair.text = ""

if row["presentation_recommendation"].lower() == "reject":
review.presentation_recommend = dm.PRESENTATION_REC_REJECT
review.presentation_recommend = dm.PresentationRecEnum.PRESENTATION_REC_REJECT
elif row["presentation_recommendation"].lower() == "accept":
review.presentation_recommend = dm.PRESENTATION_REC_ACCEPT
review.presentation_recommend = dm.PresentationRecEnum.PRESENTATION_REC_ACCEPT
else:
review.presentation_recommend = dm.PRESENTATION_REC_NONE
review.presentation_recommend = dm.PresentationRecEnum.PRESENTATION_REC_NONE

if row["publication_recommendation"].lower() == "reject":
review.publication_recommend = dm.PUBLICATION_REC_REJECT
review.publication_recommend = dm.PublicationRecEnum.PUBLICATION_REC_REJECT
elif row["publication_recommendation"].lower() == "accept":
review.publication_recommend = dm.PUBLICATION_REC_ACCEPT
review.publication_recommend = dm.PublicationRecEnum.PUBLICATION_REC_ACCEPT
else:
review.publication_recommend = dm.PRESENTATION_REC_NONE
review.publication_recommend = dm.PresentationRecEnum.PRESENTATION_REC_NONE

def _attribute_reviewer(self, review: dm.Review, row: list):
review.reviewer = dm.Reviewer.model_construct()

if row.empty:
return

review.reviewer.human = dm.Human.model_construct()
if not pd.isnull(row["name"].values[0]):
review.reviewer.human.name = row["name"].values[0]
else:
Expand All @@ -218,24 +231,28 @@ def _attribute_reviewer(self, review: dm.Review, row: list):
else:
review.reviewer.human.hash_id = ""

review.reviewer.human.current_affiliation = dm.Affiliation.model_construct()
if not pd.isnull(row["current_affiliation"].values[0]):
review.reviewer.human.current_affiliation.name = row[
"current_affiliation"
].values[0]
else:
review.reviewer.human.current_affiliation.name = ""

review.reviewer.human.last_degree_affiliation = dm.Affiliation.model_construct()
if not pd.isnull(row["last_degree_affiliation"].values[0]):
review.reviewer.human.last_degree_affiliation.name = str(
row["last_degree_affiliation"].values[0]
)
else:
review.reviewer.human.last_degree_affiliation.name = ""

review.reviewer.human.last_degree_affiliation = []
if not pd.isnull(row["previous_affiliation"].values[0]):
for affil_name in row["previous_affiliation"].values[0].split(","):
affiliation = review.reviewer.human.previous_affiliation.add()
affiliation = dm.Affiliation.model_construct()
affiliation.name = affil_name
review.reviewer.human.previous_affiliation.append(affiliation)

if not pd.isnull(row["orcid_url"].values[0]):
review.reviewer.human.orcid_url = str(row["orcid_url"].values[0])
Expand Down Expand Up @@ -265,30 +282,39 @@ def assemble_paper_book(self):
None
"""
for paper_id in self.paper_df.index:
paper = self.paper_book.paper.add()
paper = dm.Paper.model_construct()
paper.number = paper_id
paper_row = self.paper_df.loc[paper_id]
self._attribute_paper(paper, paper_row)

paper.authors = []
if "author_ids" in self.paper_df.columns:
if not pd.isnull(paper_row.author_ids):
for author_id in paper_row.author_ids.split(","):
if self.human_df["author_id"].eq(author_id).any():
human_row = self.human_df.loc[
self.human_df["author_id"] == author_id
]
self._attribute_author(paper.authors.add(), human_row)
author = dm.Author.model_construct()
self._attribute_author(author, human_row)
paper.authors.append(author)

paper_review_df = self.review_df.loc[self.review_df["paper_id"] == paper_id]
paper_review_df.set_index("reviewer_human_hash_id")

paper.reviews=[]
for hash_id in paper_review_df.index:
review_row = paper_review_df.loc[hash_id]
reviewer_hash = review_row["reviewer_human_hash_id"]
human_row = self.human_df.loc[self.human_df["hash_id"] == reviewer_hash]
review = paper.reviews.add()
review = dm.Review.model_construct()
self._attribute_review(review, review_row)
self._attribute_reviewer(review, human_row)
paper.reviews.append(review)

# validate and add paper to paper book
dm.Paper.model_validate(paper)
self.paper_book.paper.append(paper)

@staticmethod
def create_bot(paper_file: str, review_file: str, human_file: str):
Expand Down Expand Up @@ -322,10 +348,11 @@ def read_paper_book(input_file: str):
"""
read_paper_book
"""
paper_book = dm.PaperBook()
try:
# data = json.load(input_file)
# paper_book = pd.PaperBook.model_validate_json(data, strict=False)
with open(input_file, "rb") as file_pointer:
paper_book.ParseFromString(file_pointer.read())
paper_book = dm.PaperBook.model_validate_json(file_pointer.read(), strict=False)
except IOError:
print(input_file + ": File not found.")

Expand All @@ -341,7 +368,7 @@ def write_paper_book(self, output_file: str):
write_paper_book
"""
with open(output_file, "wb") as file_pointer:
file_pointer.write(self.paper_book.SerializeToString())
file_pointer.write(self.paper_book.model_dump_json().encode())

def _compute_normalized_scores(self, min_number_reviews: int):
scores_df = pd.DataFrame()
Expand Down
81 changes: 0 additions & 81 deletions chandra_bot/data_model.proto

This file was deleted.

Loading