Patterns Matter is a lightweight materials database web app built with Flask and Jinja, deployed on Fly.io.
It renders datasets and result artifacts directly from Google Drive (service account), while keeping a small SQLite catalog (uploads_log
) and an audit trail (uploads_audit
) on a persistent Fly volume mounted at /data
. No large files live in the VM—Drive is the source of truth for files.
- Frontend: HTML + Jinja templates + a thin CSS theme.
- Backend: Flask (Gunicorn worker), SQLite for metadata, Google Drive API for file storage, and a few background-style “run once” tasks guarded by a lock.
- Persistence: A Fly volume mounted at
/data
holds the SQLite DB and small assets (e.g., music clips and logs). Large datasets/results are kept on Google Drive. - Auth: Simple admin session with password (env-configurable in app code). Public routes are read-only.
- Health checks:
/healthz
responds200 OK
for Fly health probes. - Secrets:
GDRIVE_SA_JSON_BASE64
(orGDRIVE_SA_JSON
) andGDRIVE_ROOT_FOLDER_ID
are stored as Fly secrets.
- Was originally Git-backed + local uploads folder; now the app uses Google Drive as the primary data store for materials datasets/results.
- The SQLite catalog (
uploads_log
) is metadata only (filename, Drive IDs, preview/download URLs, descriptions). - Admin uploads (ZIP or folder link) write files to Drive and upsert metadata into the catalog.
- Public views read from Drive (or from local DB tables for legacy CSV/NPY still shipped in the repo).
I tried to touch almost every layer of a modern web service. Here’s a recap by technology and the concrete techniques used.
- HTML structure: semantic tags (
<table>
,<form>
,<input>
,<a>
,<button>
) and responsive wrappers. - CSS: simple layout, table styling, “pills”, hover states, small screen tweaks (
@media
). - Jinja:
- Control flow and filters:
{% if %}
,{% for %}
,|title
,|length
,|safe
. - Template variables from Flask:
render_template("view.html", uploads=uploads, admin=is_admin, ...)
. - URL building:
url_for('property_detail', property_name=p, tab='results')
– avoids hardcoding links. - Defensive rendering:
row.get('storage', 'local')
,COALESCE()
at SQL-layer to keep templates robust.
- Control flow and filters:
- performing CRUD using SQL with my sql_query tool.
Mental Model: Jinja renders server-side. I pass Python objects, Jinja turns them into HTML. Keep logic light in templates; do data prep in views.
- Routes:
@app.route('/materials/<property_name>/<tab>', methods=['GET', 'POST'])
with guards and branching:- Admin POST handlers: add Drive file, link Drive folder (recursive listing), ZIP upload → Drive, inline edits.
- Public GET: query
uploads_log
, computetable_map
for legacy local CSV/NPY tables.
- Helpers:
get_drive_service()
(lazy import, env-driven),_drive_extract_id()
,drive_list_folder_files(..., recursive=True)
,drive_ensure_property_tab_folder()
(ensures<root>/<property>/<tab>
),drive_upload_bytes()
,_drive_urls()
,file_to_table_name()
(canonicalizes filenames to SQLite-safe table names).
- Sessions:
session['admin'] = True
on login; guards likeif not session.get('admin')
for admin routes. - Rendering:
render_template()
,redirect()
,url_for()
,jsonify()
. - Startup safety: a run-once initialization guarded by a
Lock
and a module-level boolean to avoid duplicate schema work:ensure_uploads_log_schema()
ensures tables & triggers.ensure_uploads_log_columns()
backfills columns on older DBs.
- Error handling: wrap critical steps in
try/except
and log warnings instead of crashing public pages.
How to build a route (minimum):
@app.route("/path", methods=["GET","POST"])
def view_func():
# 1) read from request (form/args/session)
# 2) domain work (query/drive/api)
# 3) render_template(...) or redirect/url_for(...)
- Schema:
uploads_log(property, tab, filename, uploaded_at, storage, drive_id, preview_url, download_url, source, description)
uploads_audit(property, tab, filename, action, at)
- Triggers keep a history automatically:
- On INSERT → action
add
- On UPDATE → action
update
- On DELETE → action
delete
- On INSERT → action
- Upserts evolved for compatibility: safe
UPDATE...; if rowcount==0 INSERT...
pattern to avoidON CONFLICT
errors on older SQLite or missing indices. - Dedupe tools: optional admin endpoint to delete duplicates and re-create a unique index on
(property, tab, filename)
. - Queries: COALESCE defensive selection; pagination unnecessary due to table size but can be added.
Mental Model: Treat SQLite as a metadata ledger. Big binaries live elsewhere (Drive).
- Service account credentials come from
GDRIVE_SA_JSON_BASE64
(orGDRIVE_SA_JSON
path). - Root folder for the app is
GDRIVE_ROOT_FOLDER_ID
. Under it the app creates/uses the hierarchy:<root>/<property>/<tab>
wheretab ∈ {dataset, results}
.
- Listing:
drive_list_folder_files(..., recursive=True)
walks folders; filters by allowed extensions per tab. - Upload:
drive_upload_bytes(service, folder_id, filename, bytes)
. - Sharing: Best-effort
permissions().create(fileId, body={"role": "reader", "type": "anyone"})
for public links. - URLs: preview and download links are computed and stored in the catalog.
Mental Model: Drive is your blob store. SQLite stores pointers and descriptive metadata.
- The app runs behind Gunicorn in a Firecracker VM on Fly.
- Dockerfile defines your Python environment (packages like
google-api-python-client
,pandas
, etc.). - The container exposes port 8080; Fly proxies 80/443 to it.
- Health check:
/healthz
must be very fast and never fail. - Volume:
/data
mount stores the SQLite DB and small local assets. Fly restarts won’t erase your DB. - fly.toml highlights:
internal_port = 8080
[[http_service.checks]] path="/healthz"
[[mounts]] destination = "/data"
Mental Model: Image (code + deps) → Machine (VM) → Volume for persistence → Health checks to keep it alive.
- Public user visits
/materials/<property>/<tab>
- Flask queries
uploads_log
→ Jinja renders a table of files - For Drive rows: show Preview and Download links directly to Drive
- For legacy local CSV/NPY: map to an SQLite table name and render with
public_view
- Flask queries
- Admin:
- Add Drive file (link/ID) → write row in
uploads_log
- Link a Drive folder → enumerate files and upsert rows
- ZIP upload → Drive → (optional) expand in memory, upload each file to Drive, upsert rows
- Inline edit → updates source/description
- Triggers populate
uploads_audit
for the Admin Dashboard
- Add Drive file (link/ID) → write row in
- Startup:
ensure_uploads_log_schema()
creates tables + triggers (idempotent).ensure_uploads_log_columns()
backfills missing columns on old DBs.
See diagram: architecture_diagram.png
Set on Fly (examples):
GDRIVE_SA_JSON_BASE64
— base64 of the service account JSONGDRIVE_ROOT_FOLDER_ID
— ID of the shared folder where the app writes<property>/<tab>
subfolders
Check with:
fly secrets list -a <app-name>
python -m venv .venv
source .venv/bin/activate # (Windows: .venv\Scripts\activate)
pip install -r requirements.txt
export FLASK_APP=app.py
flask run
To simulate Drive locally, you can keep a small uploads/
tree and use the “legacy” local CSV/NPY view routes, or configure real Drive secrets in your shell environment.
- Health: open
/healthz
- Admin login:
/login
→ setsession['admin']
- Admin home:
/admin
- Repair duplicates: optional endpoint that deletes dupes and re-creates index (if present in your code)
- Logs:
fly logs -a <app>
- Secrets:
fly secrets set KEY=VALUE -a <app>
ON CONFLICT does not match any PRIMARY KEY or UNIQUE
→ Use the UPDATE→INSERT upsert pattern or ensure the unique index on(property, tab, filename)
exists.storageQuotaExceeded
for service accounts
→ Use Shared Drive with delegated permissions, or link existing files/folders instead of uploading large binaries.- 503/BuildError on
url_for('public_view', table=...)
→ Ensuretable_map.get(filename)
returns a non-empty value before rendering the link. - Duplicates in lists
→ Deduplicate in SQL and/or by usingDISTINCT
/proper unique index; also avoid adding local rows for Drive-backed items.
- OAuth user uploads (end-user Drive).
- Server-side previews for large CSVs (streamed, chunked).
- Role-based admin, audit export, and better search/filter UI.
- Background jobs for Drive sync.
TBD — choose MIT/Apache-2.0 or similar if you want broad reuse.
- Drive:
get_drive_service
,_drive_extract_id
,drive_find_or_create_folder
,drive_ensure_property_tab_folder
,drive_list_folder_files
,drive_upload_bytes
,_drive_urls
- DB:
ensure_uploads_log_schema
,ensure_uploads_log_columns
,dedupe_uploads_log
- Public/Admin:
/materials/<property>/<tab>
,/admin
,/healthz
, optional/admin/repair_uploads
- Legacy import:
auto_import_uploads
,auto_log_material_files
- Utilities:
file_to_table_name
,tableize_basename
Diagram embedded below; if viewing on GitHub, ensure the image exists alongside the README.