Patterns Matter is a lightweight materials database web app built with Flask and Jinja, deployed on Fly.io.
It renders datasets and result artifacts directly from Google Drive (service account), while keeping a small SQLite catalog (uploads_log) and an audit trail (uploads_audit) on a persistent Fly volume mounted at /data. No large files live in the VM—Drive is the source of truth for files.
- Frontend: HTML + Jinja templates + a thin CSS theme.
- Backend: Flask (Gunicorn worker), SQLite for metadata, Google Drive API for file storage, and a few background-style “run once” tasks guarded by a lock.
- Persistence: A Fly volume mounted at
/dataholds the SQLite DB and small assets (e.g., music clips and logs). Large datasets/results are kept on Google Drive. - Auth: Simple admin session with password (env-configurable in app code). Public routes are read-only.
- Health checks:
/healthzresponds200 OKfor Fly health probes. - Secrets:
GDRIVE_SA_JSON_BASE64(orGDRIVE_SA_JSON) andGDRIVE_ROOT_FOLDER_IDare stored as Fly secrets.
- Was originally Git-backed + local uploads folder; now the app uses Google Drive as the primary data store for materials datasets/results.
- The SQLite catalog (
uploads_log) is metadata only (filename, Drive IDs, preview/download URLs, descriptions). - Admin uploads (ZIP or folder link) write files to Drive and upsert metadata into the catalog.
- Public views read from Drive (or from local DB tables for legacy CSV/NPY still shipped in the repo).
I tried to touch almost every layer of a modern web service. Here’s a recap by technology and the concrete techniques used.
- HTML structure: semantic tags (
<table>,<form>,<input>,<a>,<button>) and responsive wrappers. - CSS: simple layout, table styling, “pills”, hover states, small screen tweaks (
@media). - Jinja:
- Control flow and filters:
{% if %},{% for %},|title,|length,|safe. - Template variables from Flask:
render_template("view.html", uploads=uploads, admin=is_admin, ...). - URL building:
url_for('property_detail', property_name=p, tab='results')– avoids hardcoding links. - Defensive rendering:
row.get('storage', 'local'),COALESCE()at SQL-layer to keep templates robust.
- Control flow and filters:
- performing CRUD using SQL with my sql_query tool.
Mental Model: Jinja renders server-side. I pass Python objects, Jinja turns them into HTML. Keep logic light in templates; do data prep in views.
- Routes:
@app.route('/materials/<property_name>/<tab>', methods=['GET', 'POST'])with guards and branching:- Admin POST handlers: add Drive file, link Drive folder (recursive listing), ZIP upload → Drive, inline edits.
- Public GET: query
uploads_log, computetable_mapfor legacy local CSV/NPY tables.
- Helpers:
get_drive_service()(lazy import, env-driven),_drive_extract_id(),drive_list_folder_files(..., recursive=True),drive_ensure_property_tab_folder()(ensures<root>/<property>/<tab>),drive_upload_bytes(),_drive_urls(),file_to_table_name()(canonicalizes filenames to SQLite-safe table names).
- Sessions:
session['admin'] = Trueon login; guards likeif not session.get('admin')for admin routes. - Rendering:
render_template(),redirect(),url_for(),jsonify(). - Startup safety: a run-once initialization guarded by a
Lockand a module-level boolean to avoid duplicate schema work:ensure_uploads_log_schema()ensures tables & triggers.ensure_uploads_log_columns()backfills columns on older DBs.
- Error handling: wrap critical steps in
try/exceptand log warnings instead of crashing public pages.
How to build a route (minimum):
@app.route("/path", methods=["GET","POST"])
def view_func():
# 1) read from request (form/args/session)
# 2) domain work (query/drive/api)
# 3) render_template(...) or redirect/url_for(...)- Schema:
uploads_log(property, tab, filename, uploaded_at, storage, drive_id, preview_url, download_url, source, description)uploads_audit(property, tab, filename, action, at)
- Triggers keep a history automatically:
- On INSERT → action
add - On UPDATE → action
update - On DELETE → action
delete
- On INSERT → action
- Upserts evolved for compatibility: safe
UPDATE...; if rowcount==0 INSERT...pattern to avoidON CONFLICTerrors on older SQLite or missing indices. - Dedupe tools: optional admin endpoint to delete duplicates and re-create a unique index on
(property, tab, filename). - Queries: COALESCE defensive selection; pagination unnecessary due to table size but can be added.
Mental Model: Treat SQLite as a metadata ledger. Big binaries live elsewhere (Drive).
- Service account credentials come from
GDRIVE_SA_JSON_BASE64(orGDRIVE_SA_JSONpath). - Root folder for the app is
GDRIVE_ROOT_FOLDER_ID. Under it the app creates/uses the hierarchy:<root>/<property>/<tab>wheretab ∈ {dataset, results}.
- Listing:
drive_list_folder_files(..., recursive=True)walks folders; filters by allowed extensions per tab. - Upload:
drive_upload_bytes(service, folder_id, filename, bytes). - Sharing: Best-effort
permissions().create(fileId, body={"role": "reader", "type": "anyone"})for public links. - URLs: preview and download links are computed and stored in the catalog.
Mental Model: Drive is your blob store. SQLite stores pointers and descriptive metadata.
- The app runs behind Gunicorn in a Firecracker VM on Fly.
- Dockerfile defines your Python environment (packages like
google-api-python-client,pandas, etc.). - The container exposes port 8080; Fly proxies 80/443 to it.
- Health check:
/healthzmust be very fast and never fail. - Volume:
/datamount stores the SQLite DB and small local assets. Fly restarts won’t erase your DB. - fly.toml highlights:
internal_port = 8080[[http_service.checks]] path="/healthz"[[mounts]] destination = "/data"
Mental Model: Image (code + deps) → Machine (VM) → Volume for persistence → Health checks to keep it alive.
- Public user visits
/materials/<property>/<tab>- Flask queries
uploads_log→ Jinja renders a table of files - For Drive rows: show Preview and Download links directly to Drive
- For legacy local CSV/NPY: map to an SQLite table name and render with
public_view
- Flask queries
- Admin:
- Add Drive file (link/ID) → write row in
uploads_log - Link a Drive folder → enumerate files and upsert rows
- ZIP upload → Drive → (optional) expand in memory, upload each file to Drive, upsert rows
- Inline edit → updates source/description
- Triggers populate
uploads_auditfor the Admin Dashboard
- Add Drive file (link/ID) → write row in
- Startup:
ensure_uploads_log_schema()creates tables + triggers (idempotent).ensure_uploads_log_columns()backfills missing columns on old DBs.
See diagram: architecture_diagram.png
Set on Fly (examples):
GDRIVE_SA_JSON_BASE64— base64 of the service account JSONGDRIVE_ROOT_FOLDER_ID— ID of the shared folder where the app writes<property>/<tab>subfolders
Check with:
fly secrets list -a <app-name>python -m venv .venv
source .venv/bin/activate # (Windows: .venv\Scripts\activate)
pip install -r requirements.txt
export FLASK_APP=app.py
flask runTo simulate Drive locally, you can keep a small uploads/ tree and use the “legacy” local CSV/NPY view routes, or configure real Drive secrets in your shell environment.
- Health: open
/healthz - Admin login:
/login→ setsession['admin'] - Admin home:
/admin - Repair duplicates: optional endpoint that deletes dupes and re-creates index (if present in your code)
- Logs:
fly logs -a <app> - Secrets:
fly secrets set KEY=VALUE -a <app>
ON CONFLICT does not match any PRIMARY KEY or UNIQUE
→ Use the UPDATE→INSERT upsert pattern or ensure the unique index on(property, tab, filename)exists.storageQuotaExceededfor service accounts
→ Use Shared Drive with delegated permissions, or link existing files/folders instead of uploading large binaries.- 503/BuildError on
url_for('public_view', table=...)
→ Ensuretable_map.get(filename)returns a non-empty value before rendering the link. - Duplicates in lists
→ Deduplicate in SQL and/or by usingDISTINCT/proper unique index; also avoid adding local rows for Drive-backed items.
- OAuth user uploads (end-user Drive).
- Server-side previews for large CSVs (streamed, chunked).
- Role-based admin, audit export, and better search/filter UI.
- Background jobs for Drive sync.
TBD — choose MIT/Apache-2.0 or similar if you want broad reuse.
- Drive:
get_drive_service,_drive_extract_id,drive_find_or_create_folder,drive_ensure_property_tab_folder,drive_list_folder_files,drive_upload_bytes,_drive_urls - DB:
ensure_uploads_log_schema,ensure_uploads_log_columns,dedupe_uploads_log - Public/Admin:
/materials/<property>/<tab>,/admin,/healthz, optional/admin/repair_uploads - Legacy import:
auto_import_uploads,auto_log_material_files - Utilities:
file_to_table_name,tableize_basename
Diagram embedded below; if viewing on GitHub, ensure the image exists alongside the README.
