UMD IC25 - Amtrak FOIA Data Analysis (04)

For this project's abstract/description, please see the Abstract segment of the README.

UMD IC25 - Amtrak FOIA Data Analysis (04)

Authors

Luis G. (@curlyLasagna), Andreas P. (@Greek), Rayquan W. (@Factorial343)

Abstract

To view the abstract, see Abstract.pdf.

Presentation

To view our presentation, see IC25 Presentation.pdf.

Results

The results of the processed FOIA data is available in the results.csv file.

Unprocessed Data

All raw and unprocessed data belongs in the data/ directory, which is then processed by the src/to_csv.py, src/classification.py, and src/semantic_search.py scripts to produce results.

Here are the files required for processing:

A CSV file of all FOIA requests - data/pii/data.csv (not included in repo)
A list of all departments - data/departments.csv

There is also some information that includes personally identifiable information, also known as PII. This data is stored in the data/pii/ directory, which unfortunately includes the actual un-processed FOIA dataset

Use of AI

Generative AI was heavily used throughout this project


Platform	Google AI Studio. Gemini 2.0 Flash model
How it was used	Explain concepts concisely. How to generate a chart using Altair. Debug through errors. Determine where a keyword should go based on a department's name
Learning points	Filtering dataframes. Applying functions to each row of a dataframe. What stop words are in the context of keyword extraction. Libraries to use

Usage

To utilize the Python programs used to complete this project, you must,

Have Python 3.13 installed
Have 'uv' package manager installed, available here: https://docs.astral.sh/uv/#installation

Getting started

First, install all the required packages using uv:

$ uv sync # Installs the necessary packages

Then pull a copy of all FOIAs (or a subset) in CSV format, and save it in ./data/pii/data.csv.

Warning

Ensure the columns are in the following format: Request ID,Request Description

Running the classification script

Warning

Ensure you're in the ROOT DIRECTORY of this repository when executing the script.

To run the classification script, run the following:

$ python3 src/classification.py

The results will appear in the results.csv file.

Run notebook

Make sure a venv is created and activated

marimo edit semantic_search.py

Streamlit Prototype

To run a quick prototype of our search app that will return department that semantic search considers as the best candidate

streamlit run search_app.py

Name		Name	Last commit message	Last commit date
Latest commit History 52 Commits
data		data
src		src
.gitignore		.gitignore
.python-version		.python-version
Abstract.pdf		Abstract.pdf
IC25_Presentation.pdf		IC25_Presentation.pdf
README.md		README.md
pyproject.toml		pyproject.toml
results.csv		results.csv
search.py		search.py
search_app.py		search_app.py
semantic_search.py		semantic_search.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

UMD IC25 - Amtrak FOIA Data Analysis (04)

Authors

Abstract

Presentation

Results

Unprocessed Data

Use of AI

Usage

Getting started

Running the classification script

Run notebook

Streamlit Prototype

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

UMD IC25 - Amtrak FOIA Data Analysis (04)

Authors

Abstract

Presentation

Results

Unprocessed Data

Use of AI

Usage

Getting started

Running the classification script

Run notebook

Streamlit Prototype

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages