This directory contains some code to check compliance of proposals submitted to NASA ROSES calls.
Code Authors: Nazifa Taha @nazifataha1 Steven Crawford @nasacrawford Megan Ansdell @mansdell
PyMuPDF: a useful package for importing PDF text (which confusingly is imported as import fitz)
This code reads in an anonymized proposal for one of NASA's Dual-Anonymous Peer Reivew (DAPR) programs. It attempts to locate the references section (if not provided by the user) and then checks a variety of things to make sure it is DAPR compliant.
The code requires two inputs (in this order) and can take two additional optional inputs:
- REQUIRED: Path to the anonymized proposal PDF. This is not the full proposal generated by NSPIRES.
- REQUIRED: Path to a file with the team member information (Team_Info_Path). There are two options for this:
- CSV file with first names, last names, institutions, and cities of each team member (an example is provided in this repo) OR
- The non-anonymized NSPIRES-generated proposal
- OPTIONAL: Start page of the references section in the PDF (otherwise the code will attempt to guess this)
- OPTIONAL: End page of the references section in the PDF (otherwise the code will attempt to guess this)
Example command line inputs with only required inputs (where you would replace the paths with your own):
python check_dapr_single.py "./anonproposal.pdf" "./NSPIRES_Full_Proposal.pdf"
python check_dapr_single.py "./anonproposal.pdf" "./team_info.csv"
Example command line with optional inputs for start and end pages of references section:
python check_dapr_single.py "./anonproposal.pdf" "./NSPIRES_Full_Proposal.pdf" 17 21
python check_dapr_single.py "./anonproposal.pdf" "./team_info.csv" 17 21
The code outputs the following:
-
Page ranges for the STM and References sections
- The code guesses these proposal sections by assuming the following order: STM, References, Other (e.g., budget).
- You can input the references start/end pages manually (see above) to avoid this issue
- The guesses are usually correct, but sometimes they're not. This only really matters for searching for the team member names: if the code got the references section wrong, it'll probably incorrectly flag the team member names as being in the main proposal text and/or miss DAPR violations in the budget section if the budget was improperly or not fully redacted.
-
Reference format
- DAPR proposals are supposed to use bracketed number references, rather than "et al." references
- The code reports the number of brackets found in the proposal and number of "et al." usages in proposal (the former number should be high, the latter should be zero)
-
Forbidden DAPR words
- DAPR proposal shouldn't include any identifying team member information (names, institutions, cities, genders)
- The code reports number of times such things are found and the page numbers on which they are found
- Note that if you use the NSPIRES option for inputting team member names, cities are not included as that info is not in the NSPIRES cover pages.
This is a version of check_dapr_single.py that can be used to check multiple proposals at a time.
The code requires two inputs (in this order):
- REQUIRED: Path to directory containing the anonymized proposal PDFs
- REQUIRED: Path to directory containing the full, non-anonymized NSPIRES-generated proposal PDFs
Note that the CSV option for the team member info is not available for this version.
Example command line inputs (where you would replace the paths with your own):
python check_dapr_multi.py ./proposals_anon ./proposals_full
The output is the same as check_dapr_single.py except that it first prints the name of the anonymized PDF being checked and the name of the non-anonymized PDF that is being used for the team info, so that you can make sure the correct files are being compared. If the number of anonymized and non-anonymized PDFs are not equal, the program will quit before doing anything further.
This code reads in a proposal (either the anonymized version or the full, non-anonymized NSPIRES-generated PDF) and attempts to find the "Scientific / Technical / Management" section and then checks ROSES formatting requirements (font size, lines-per-inch, characters-per-inch). Please make sure to read the ROSES solicitation and NASA Guidebook for Proposers carefully, as formatting requirements may be different than those flagged below.
The code requires one input, the path to the proposal PDF. Example command line input (where you would replace the paths with your own):
python check_format_single.py ./NSPIRES_Full_Proposal.pdf
The code outputs the following:
-
PI name and proposal number
- These are taken from the cover page of the NSPIRES-formatted PDF
-
Font size
- The median font size used in the proposal is calculated and output to the terminal
- A histogram of the font sizes is saved to the current directory (the gray horizontal line indicates ~12-point font size)
-
Lines per inch (LPI) and counts per inch (CPI)
- LPI is calculated per page and for pages with LPI > 5.5, the page number of the violation and the LPI value is provided.
- CPI is calculated per line and the number of pages for which CPI > 16.0 is provided along with snippets of the line text
- Note that PDF formats are weird and not inherently machine readable, so these calculations are not exact and results should be checked carefully. The limits for LPI and CPI used in the code are purposefully lenient compared to the current ROSES requirements for this reason, thus the code will only report blatant violations (or weird PDF formats that could not be read properly).
This code is the most up to date version of the ROSES Compliance Checker tool. This version is yet to be released in i-NSPIRES.
- Windows compatible
- Compatible with different formats of proposal master column names such as proposal number and team member information
- Compatible with different formats of institution names
- Removed dependencies of colored word doc text and colored printed statement in terminal output display.
This code reads in an anonymized proposal submitted to a ROSES program that follows Dual-Anonymous Peer Review (DAPR); can be redacted NSPIRES-generated PDF or just the anonymized proposal PDF. The code attempts to find the different sections of the proposal (STM, DMP, Relevance, Budget) and then checks for DAPR compliance and formatting compliance. The outputs are described below.
The code requires 3 inputs (in this order) with quotation marks around them:
- REQUIRED: Path to the anonymized proposal PDFs. This can also be the "redacted" PDFs with NSPIRES front-matter.
- REQUIRED: Suffix of proposal PDFs (what comes before ".pdf" but after the proposal number) e.g., for "23-XRP23_2-0003_Redacted.pdf" the suffix would be "_Redacted"
- REQUIRED: Path to "Proposal Master" report from i-NSPIRES in CSV format (not Excel)
The code outputs its findings to the terminal as it checks each proposal. When all proposals are checked, the code will also output a final CSV file named “dapr_checks.csv” and an optional text doc of the outputs if to your directory path where all the pdf proposals and their corresponding proposal master file exist. The information includes:
-
Page ranges for proposal sections
- These assume the following order: STM, References, DMP, Relevance, Budget. The code only gives possible STM start and end pages and possible Reference start and end pages.
- They’re usually correct, but sometimes they’re not; this only really matters for searching for the PI name but avoiding the Reference section
- The value -99 is reported if the page limits could not be found
-
Median font size
- The median font size used in the proposal is calculated, and a warning is given when <=11.8 pt(e.g., for checking compliance)
-
Reference format
- DAPR proposals are supposed to use bracketed number references
- Reports numbers of brackets found in proposal and number of “et.al” usages in proposal (the former number should be high, the latter low)
- Also reports number of parenthesis references if over 20 is found.
-
Forbidden DAPR words
- DAPR proposal should not include references to previous work, institutions/departments/universities/cities, PI or Co-I names, etc.
- Reports pronouns (she, he, her, hers, his, him), team member names, team member institutions and pi cities
- Reports number of times such words are found and page numbers on which they are found
These tools are provided for informational purposes. There are no guarantees implied with the usage of these tools.
Contact Nazifa Taha @nazifataha1 with questions.