For this project's abstract/description, please see the Abstract segment of the README.
Luis G. (@curlyLasagna), Andreas P. (@Greek), Rayquan W. (@Factorial343)
To view the abstract, see Abstract.pdf.
To view our presentation, see IC25 Presentation.pdf.
The results of the processed FOIA data is available in the results.csv file.
All raw and unprocessed data belongs in the data/ directory, which is
then processed by the src/to_csv.py, src/classification.py, and src/semantic_search.py
scripts to produce results.
Here are the files required for processing:
- A CSV file of all FOIA requests -
data/pii/data.csv(not included in repo) - A list of all departments -
data/departments.csv
There is also some information that includes personally identifiable information,
also known as PII. This data is stored in the data/pii/ directory, which
unfortunately includes the actual un-processed FOIA dataset
Generative AI was heavily used throughout this project
| Platform | Google AI Studio. Gemini 2.0 Flash model |
| How it was used | Explain concepts concisely. How to generate a chart using Altair. Debug through errors. Determine where a keyword should go based on a department's name |
| Learning points | Filtering dataframes. Applying functions to each row of a dataframe. What stop words are in the context of keyword extraction. Libraries to use |
To utilize the Python programs used to complete this project, you must,
- Have Python 3.13 installed
- Have 'uv' package manager installed, available here: https://docs.astral.sh/uv/#installation
First, install all the required packages using uv:
$ uv sync # Installs the necessary packagesThen pull a copy of all FOIAs (or a subset) in CSV format, and save it in ./data/pii/data.csv.
Warning
Ensure the columns are in the following format: Request ID,Request Description
Warning
Ensure you're in the ROOT DIRECTORY of this repository when executing the script.
To run the classification script, run the following:
$ python3 src/classification.pyThe results will appear in the results.csv file.
Make sure a venv is created and activated
marimo edit semantic_search.py
To run a quick prototype of our search app that will return department that semantic search considers as the best candidate
streamlit run search_app.py