Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 13 additions & 18 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -1,27 +1,22 @@
repos:
- repo: https://github.com/astral-sh/uv-pre-commit
# uv version.
rev: 0.8.9
rev: 0.9.27
hooks:
- id: uv-lock
- repo: https://github.com/psf/black
rev: 24.10.0
- repo: https://github.com/astral-sh/ruff-pre-commit
# Ruff version.
rev: v0.14.14
hooks:
- id: black
language_version: python3.13
exclude: migrations
- repo: https://github.com/PyCQA/flake8
rev: 7.1.1
hooks:
- id: flake8
# Run the linter.
- id: ruff-check
args: [--fix]
exclude: settings|migrations|tests
- repo: https://github.com/pycqa/isort
rev: 5.13.2
hooks:
- id: isort
args: ["--profile", "black", "--filter-files"]
# Run the formatter.
- id: ruff-format
exclude: migrations
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v5.0.0
rev: v6.0.0
hooks:
- id: trailing-whitespace
- id: end-of-file-fixer
Expand All @@ -36,11 +31,11 @@ repos:
- id: debug-statements
- id: detect-private-key
- repo: https://github.com/asottile/pyupgrade
rev: v3.20.0
rev: v3.21.2
hooks:
- id: pyupgrade
- repo: https://github.com/adamchainz/django-upgrade
rev: 1.25.0
rev: 1.29.1
hooks:
- id: django-upgrade
args: [--target-version, "5.2"]
49 changes: 49 additions & 0 deletions AGENTS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
# AGENTS.md

This file provides guidance when working with code in this repository.

## Project Overview

NC CopWatch is a data-driven application that implements a nightly ETL pipeline to download, import, and analyze the North Carolina Department of Justice's raw law enforcement traffic stop data, generating visualizations and comparative analytics for 30+ million records to promote data transparency and accountability in statewide policing.

- NC CopWatch is a Django 5.2 project built on Python 3.13.
- The front end is a React SPA built with Node.js 22 and communicates with the Django backend via a REST API using Django REST Framework.
- The main database is PostgreSQL 16.
- Celery is used for background jobs and scheduled tasks.

## Development Commands

### Environment Setup

- `uv` is used for Python dependency management.
- Install Python dependencies: `uv sync`
- Add Python dependencies: `uv add <library>` or `uv add --group dev <library>` for dev-only
- Run database migrations: `uv run ./migrate_all_dbs.sh`
- Create superuser: `uv run manage.py createsuperuser`
- You can run generic Python commands using `uv run <command>`
- The `frontend/` directory contains the React front end.
- Install Node.js dependencies: `cd frontend && npm install`

### Multiple Databases

- The project uses multiple databases: `default` and `traffic_stops_nc`.

### Running the Application

- Start the development server: `uv run manage.py runserver`
- Start the React development server: `cd frontend && npm run start`

### Testing

- Run tests with pytest: `uv run pytest`
- Tests are located in `tests/` directories and follow standard pytest-django and pytest-mock conventions.
- factoryboy is used for test data creation.

### Code Quality

- Run pre-commit hooks: `uv run pre-commit run --all-files`

### Deployment

- Ansible is used for deployment automation with playbooks located in the `deploy/` directory.
- Install Ansible dependencies: `uv run ansible-galaxy install -fr deploy/requirements.yml`
39 changes: 22 additions & 17 deletions nc/data/NC Data Highlights.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -8,9 +8,10 @@
},
"outputs": [],
"source": [
"from django.db.models import Q, Count, Sum\n",
"from django.db import connections\n",
"from nc.models import Agency, Stop, Search"
"from django.db.models import Count, Q, Sum\n",
"\n",
"from nc.models import Agency, Search, Stop"
]
},
{
Expand Down Expand Up @@ -54,7 +55,7 @@
}
],
"source": [
"sheriff = Agency.objects.filter(name__icontains='Sheriff')\n",
"sheriff = Agency.objects.filter(name__icontains=\"Sheriff\")\n",
"sheriff.count()"
]
},
Expand All @@ -77,7 +78,7 @@
}
],
"source": [
"police = Agency.objects.filter(name__icontains='Police')\n",
"police = Agency.objects.filter(name__icontains=\"Police\")\n",
"police.count()"
]
},
Expand Down Expand Up @@ -147,7 +148,7 @@
}
],
"source": [
"list(Agency.objects.filter(Q(name__icontains='College') | Q(name__icontains='University')))"
"list(Agency.objects.filter(Q(name__icontains=\"College\") | Q(name__icontains=\"University\")))"
]
},
{
Expand Down Expand Up @@ -200,7 +201,11 @@
}
],
"source": [
"top_ten_agencies = Agency.objects.annotate(total_stops=Count('stops')).values_list('name', 'total_stops').order_by('-total_stops')[:10]\n",
"top_ten_agencies = (\n",
" Agency.objects.annotate(total_stops=Count(\"stops\"))\n",
" .values_list(\"name\", \"total_stops\")\n",
" .order_by(\"-total_stops\")[:10]\n",
")\n",
"list(top_ten_agencies)"
]
},
Expand All @@ -224,7 +229,7 @@
}
],
"source": [
"top_ten_agencies.aggregate(Sum('total_stops'))"
"top_ten_agencies.aggregate(Sum(\"total_stops\"))"
]
},
{
Expand All @@ -246,8 +251,8 @@
}
],
"source": [
"top_ten_percent = 13338363/18819973\n",
"\"{0:.4f}%\".format(top_ten_percent*100)"
"top_ten_percent = 13338363 / 18819973\n",
"f\"{top_ten_percent * 100:.4f}%\""
]
},
{
Expand All @@ -270,7 +275,7 @@
}
],
"source": [
"state_highway_patrol = Agency.objects.get(name='NC State Highway Patrol')\n",
"state_highway_patrol = Agency.objects.get(name=\"NC State Highway Patrol\")\n",
"state_highway_patrol.stops.count()"
]
},
Expand All @@ -293,7 +298,7 @@
}
],
"source": [
"\"{0:.4f}%\".format(8827911/18819973*100)"
"f\"{8827911 / 18819973 * 100:.4f}%\""
]
},
{
Expand Down Expand Up @@ -329,10 +334,10 @@
}
],
"source": [
"year = connections[Stop.objects.db].ops.date_trunc_sql('year', 'date')\n",
"qs = Stop.objects.extra(select={'year': year})\n",
"qs = qs.values('year').annotate(total_stops=Count('date')).order_by('-year')\n",
"summary = [(row['year'].year, row['total_stops']) for row in list(qs)]\n",
"year = connections[Stop.objects.db].ops.date_trunc_sql(\"year\", \"date\")\n",
"qs = Stop.objects.extra(select={\"year\": year})\n",
"qs = qs.values(\"year\").annotate(total_stops=Count(\"date\")).order_by(\"-year\")\n",
"summary = [(row[\"year\"].year, row[\"total_stops\"]) for row in list(qs)]\n",
"summary"
]
},
Expand Down Expand Up @@ -378,8 +383,8 @@
}
],
"source": [
"state_search_rate = Search.objects.count()/Stop.objects.count()\n",
"\"{0:.4f}%\".format(state_search_rate*100)"
"state_search_rate = Search.objects.count() / Stop.objects.count()\n",
"f\"{state_search_rate * 100:.4f}%\""
]
},
{
Expand Down
54 changes: 25 additions & 29 deletions nc/data/importer.py
Original file line number Diff line number Diff line change
Expand Up @@ -128,13 +128,12 @@ def truncate_input_data(destination, min_stop_id, max_stop_id):
for in_basename, stops_field_num in data_file_description:
data_in_path = os.path.join(destination, in_basename)
data_out_path = data_in_path + ".new"
with open(data_in_path, "rb") as data_in:
with open(data_out_path, "wb") as data_out:
for line in data_in:
fields = line.split(b"\t")
stop_id = int(fields[stops_field_num])
if min_stop_id <= stop_id <= max_stop_id:
data_out.write(line)
with open(data_in_path, "rb") as data_in, open(data_out_path, "wb") as data_out:
for line in data_in:
fields = line.split(b"\t")
stop_id = int(fields[stops_field_num])
if min_stop_id <= stop_id <= max_stop_id:
data_out.write(line)
os.replace(data_out_path, data_in_path)


Expand All @@ -159,22 +158,21 @@ def to_standard_csv(input_path, output_path):
quoting=csv.QUOTE_MINIMAL,
skipinitialspace=False,
)
with open(input_path) as input:
with open(output_path, "w") as output:
reader = csv.reader(input, dialect="nc_data_in")
writer = csv.writer(output, dialect="nc_data_out")
headings_written = False
num_columns = sys.maxsize # keep all of first row, however many
for row in reader:
columns = [column.strip() for i, column in enumerate(row) if i < num_columns]
if not headings_written:
# Some records in Stops.csv have extra columns; drop any
# columns beyond those in the first record.
num_columns = len(columns)
headings = ["column%d" % (i + 1) for i in range(len(columns))]
writer.writerow(headings)
headings_written = True
writer.writerow(columns)
with open(input_path) as input, open(output_path, "w") as output:
reader = csv.reader(input, dialect="nc_data_in")
writer = csv.writer(output, dialect="nc_data_out")
headings_written = False
num_columns = sys.maxsize # keep all of first row, however many
for row in reader:
columns = [column.strip() for i, column in enumerate(row) if i < num_columns]
if not headings_written:
# Some records in Stops.csv have extra columns; drop any
# columns beyond those in the first record.
num_columns = len(columns)
headings = ["column%d" % (i + 1) for i in range(len(columns))]
writer.writerow(headings)
headings_written = True
writer.writerow(columns)


def convert_to_csv(destination):
Expand Down Expand Up @@ -225,7 +223,7 @@ def update_nc_agencies(nc_csv_path, destination):

with open(nc_csv_path) as agency_file:
agency_table = csv.reader(agency_file)
agency_table_contents = list()
agency_table_contents = []
agency_table_contents.append(next(agency_table))
existing_agencies = set()
for row in agency_table:
Expand Down Expand Up @@ -256,12 +254,10 @@ def update_nc_agencies(nc_csv_path, destination):

email_body = """
Here are the new agencies:\n
%s\n
{}\n
A new agency table is attached. You can add census codes for the
the new agencies before checking in.
""" % ", ".join(
extra_agencies
)
""".format(", ".join(extra_agencies))
email = EmailMessage(
"New NC agencies were discovered during import",
email_body,
Expand Down Expand Up @@ -300,7 +296,7 @@ def copy_from(destination, nc_csv_path):
# datasets
path = Path(destination)
for p in path.glob("*.csv"):
if p.name in copy_nc.NC_COPY_INSTRUCTIONS.keys():
if p.name in copy_nc.NC_COPY_INSTRUCTIONS:
with p.open() as fh:
logger.info(f"COPY {p.name} into the database")
with cur.copy(copy_nc.NC_COPY_INSTRUCTIONS[p.name]) as copy:
Expand Down
23 changes: 14 additions & 9 deletions nc/data/timezones.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"execution_count": null,
"id": "jewish-location",
"metadata": {},
"outputs": [],
Expand All @@ -11,11 +11,11 @@
"# From project root, run: jupyter-lab\n",
"\n",
"import os\n",
"import sys\n",
"\n",
"os.chdir('../..')\n",
"os.chdir(\"../..\")\n",
"\n",
"import django # noqa\n",
"\n",
"import django\n",
"django.setup()"
]
},
Expand All @@ -27,12 +27,13 @@
"outputs": [],
"source": [
"import datetime as dt\n",
"\n",
"import pytz\n",
"\n",
"from django.utils import timezone\n",
"from django.test import override_settings\n",
"from django.utils import timezone\n",
"\n",
"from nc.models import Stop, Person, Agency"
"from nc.models import Agency, Stop"
]
},
{
Expand Down Expand Up @@ -148,7 +149,9 @@
}
],
"source": [
"stop_utc = Stop.objects.no_cache().filter(agency=durham, date__gt=start_date_utc).order_by(\"date\").first()\n",
"stop_utc = (\n",
" Stop.objects.no_cache().filter(agency=durham, date__gt=start_date_utc).order_by(\"date\").first()\n",
")\n",
"stop_utc.stop_id, stop_utc.date"
]
},
Expand Down Expand Up @@ -189,7 +192,9 @@
}
],
"source": [
"stop_nyc = Stop.objects.no_cache().filter(agency=durham, date__gt=start_date_nyc).order_by(\"date\").first()\n",
"stop_nyc = (\n",
" Stop.objects.no_cache().filter(agency=durham, date__gt=start_date_nyc).order_by(\"date\").first()\n",
")\n",
"stop_nyc.stop_id, stop_nyc.date"
]
},
Expand Down Expand Up @@ -219,7 +224,7 @@
}
],
"source": [
"with override_settings(TIME_ZONE='UTC'):\n",
"with override_settings(TIME_ZONE=\"UTC\"):\n",
" date = timezone.localtime(stop_utc.date)\n",
"date"
]
Expand Down
Loading