|
| 1 | + |
| 2 | +# Patterns Matter — Materials Database (Flask + Google Drive + SQLite) 🧪 |
| 3 | + |
| 4 | +## 1) Overview (Implementation Perspective) |
| 5 | + |
| 6 | +**Patterns Matter** is a lightweight materials database web app built with **Flask** and **Jinja**, deployed on **Fly.io**. |
| 7 | +It renders datasets and result artifacts *directly from Google Drive* (service account), while keeping a small **SQLite** catalog (`uploads_log`) and an **audit trail** (`uploads_audit`) on a persistent Fly volume mounted at `/data`. No large files live in the VM—Drive is the source of truth for files. |
| 8 | + |
| 9 | +### High-level |
| 10 | +- **Frontend:** HTML + Jinja templates + a thin CSS theme. |
| 11 | +- **Backend:** Flask (Gunicorn worker), SQLite for metadata, Google Drive API for file storage, and a few background-style “run once” tasks guarded by a lock. |
| 12 | +- **Persistence:** A Fly volume mounted at **`/data`** holds the SQLite DB and small assets (e.g., music clips and logs). Large datasets/results are kept on **Google Drive**. |
| 13 | +- **Auth:** Simple admin session with password (env-configurable in app code). Public routes are read-only. |
| 14 | +- **Health checks:** `/healthz` responds `200 OK` for Fly health probes. |
| 15 | +- **Secrets:** `GDRIVE_SA_JSON_BASE64` (or `GDRIVE_SA_JSON`) and `GDRIVE_ROOT_FOLDER_ID` are stored as **Fly secrets**. |
| 16 | + |
| 17 | +### What changed during the migration |
| 18 | +- Was originally **Git-backed + local uploads folder**; now the app uses **Google Drive** as the primary data store for *materials datasets/results*. |
| 19 | +- The SQLite catalog (`uploads_log`) is **metadata only** (filename, Drive IDs, preview/download URLs, descriptions). |
| 20 | +- Admin **uploads** (ZIP or folder link) write files to Drive and **upsert** metadata into the catalog. |
| 21 | +- Public **views** read from Drive (or from local DB tables for legacy CSV/NPY still shipped in the repo). |
| 22 | + |
| 23 | +--- |
| 24 | + |
| 25 | +## 2) Key Takeaways (Learner Perspective) |
| 26 | + |
| 27 | +You touched almost every layer of a modern web service. Here’s a recap by technology and the concrete techniques used. |
| 28 | + |
| 29 | +### HTML / CSS / Jinja (Templating) |
| 30 | +- **HTML structure**: semantic tags (`<table>`, `<form>`, `<input>`, `<a>`, `<button>`) and responsive wrappers. |
| 31 | +- **CSS**: simple layout, table styling, “pills”, hover states, small screen tweaks (`@media`). |
| 32 | +- **Jinja**: |
| 33 | + - Control flow and filters: `{% if %}`, `{% for %}`, `|title`, `|length`, `|safe`. |
| 34 | + - Template variables from Flask: `render_template("view.html", uploads=uploads, admin=is_admin, ...)`. |
| 35 | + - URL building: `url_for('property_detail', property_name=p, tab='results')` – avoids hardcoding links. |
| 36 | + - Defensive rendering: `row.get('storage', 'local')`, `COALESCE()` at SQL-layer to keep templates robust. |
| 37 | + |
| 38 | +**Mental Model:** Jinja renders **server-side**. You pass Python objects, Jinja turns them into HTML. Keep logic light in templates; do data prep in views. |
| 39 | + |
| 40 | +--- |
| 41 | + |
| 42 | +### Flask (Backend) |
| 43 | +- **Routes**: `@app.route('/materials/<property_name>/<tab>', methods=['GET', 'POST'])` with guards and branching: |
| 44 | + - Admin POST handlers: add Drive file, link Drive folder (recursive listing), ZIP upload → Drive, inline edits. |
| 45 | + - Public GET: query `uploads_log`, compute `table_map` for legacy local CSV/NPY tables. |
| 46 | +- **Helpers**: |
| 47 | + - `get_drive_service()` (lazy import, env-driven), |
| 48 | + - `_drive_extract_id()`, `drive_list_folder_files(..., recursive=True)`, |
| 49 | + - `drive_ensure_property_tab_folder()` (ensures `<root>/<property>/<tab>`), |
| 50 | + - `drive_upload_bytes()`, `_drive_urls()`, |
| 51 | + - `file_to_table_name()` (canonicalizes filenames to SQLite-safe table names). |
| 52 | +- **Sessions**: `session['admin'] = True` on login; guards like `if not session.get('admin')` for admin routes. |
| 53 | +- **Rendering**: `render_template()`, `redirect()`, `url_for()`, `jsonify()`. |
| 54 | +- **Startup safety**: a run-once initialization guarded by a **`Lock`** and a module-level boolean to avoid duplicate schema work: |
| 55 | + - `ensure_uploads_log_schema()` ensures tables & triggers. |
| 56 | + - `ensure_uploads_log_columns()` backfills columns on older DBs. |
| 57 | +- **Error handling**: wrap critical steps in `try/except` and log warnings instead of crashing public pages. |
| 58 | + |
| 59 | +**How to build a route (minimum):** |
| 60 | +```python |
| 61 | +@app.route("/path", methods=["GET","POST"]) |
| 62 | +def view_func(): |
| 63 | + # 1) read from request (form/args/session) |
| 64 | + # 2) domain work (query/drive/api) |
| 65 | + # 3) render_template(...) or redirect/url_for(...) |
| 66 | +``` |
| 67 | + |
| 68 | +--- |
| 69 | + |
| 70 | +### SQLite (SQL, Indices, Triggers) |
| 71 | +- **Schema**: |
| 72 | + - `uploads_log(property, tab, filename, uploaded_at, storage, drive_id, preview_url, download_url, source, description)` |
| 73 | + - `uploads_audit(property, tab, filename, action, at)` |
| 74 | +- **Triggers** keep a **history** automatically: |
| 75 | + - On INSERT → action `add` |
| 76 | + - On UPDATE → action `update` |
| 77 | + - On DELETE → action `delete` |
| 78 | +- **Upserts** evolved for compatibility: safe `UPDATE...; if rowcount==0 INSERT...` pattern to avoid `ON CONFLICT` errors on older SQLite or missing indices. |
| 79 | +- **Dedupe tools**: optional admin endpoint to delete duplicates and re-create a unique index on `(property, tab, filename)`. |
| 80 | +- **Queries**: COALESCE defensive selection; pagination unnecessary due to table size but can be added. |
| 81 | + |
| 82 | +**Mental Model:** Treat SQLite as a **metadata ledger**. Big binaries live elsewhere (Drive). |
| 83 | + |
| 84 | +--- |
| 85 | + |
| 86 | +### Google Drive API (Service Account) |
| 87 | +- **Service account** credentials come from `GDRIVE_SA_JSON_BASE64` *(or `GDRIVE_SA_JSON` path)*. |
| 88 | +- **Root folder** for the app is `GDRIVE_ROOT_FOLDER_ID`. Under it the app creates/uses the hierarchy: |
| 89 | + - `<root>/<property>/<tab>` where `tab ∈ {dataset, results}`. |
| 90 | +- **Listing**: `drive_list_folder_files(..., recursive=True)` walks folders; filters by allowed extensions per tab. |
| 91 | +- **Upload**: `drive_upload_bytes(service, folder_id, filename, bytes)`. |
| 92 | +- **Sharing**: Best-effort `permissions().create(fileId, body={"role": "reader", "type": "anyone"})` for public links. |
| 93 | +- **URLs**: preview and download links are computed and stored in the catalog. |
| 94 | + |
| 95 | +**Mental Model:** Drive is your **blob store**. SQLite stores **pointers and descriptive metadata**. |
| 96 | + |
| 97 | +--- |
| 98 | + |
| 99 | +### Containers & Dockerfile (Deployment Mechanics) |
| 100 | +- The app runs behind **Gunicorn** in a **Firecracker VM** on Fly. |
| 101 | +- **Dockerfile** defines your Python environment (packages like `google-api-python-client`, `pandas`, etc.). |
| 102 | +- The container exposes port **8080**; Fly proxies 80/443 to it. |
| 103 | +- **Health check**: `/healthz` must be very fast and never fail. |
| 104 | +- **Volume**: `/data` mount stores the SQLite DB and small local assets. Fly restarts won’t erase your DB. |
| 105 | +- **fly.toml** highlights: |
| 106 | + - `internal_port = 8080` |
| 107 | + - `[[http_service.checks]] path="/healthz"` |
| 108 | + - `[[mounts]] destination = "/data"` |
| 109 | + |
| 110 | +**Mental Model:** Image (code + deps) → Machine (VM) → Volume for persistence → Health checks to keep it alive. |
| 111 | + |
| 112 | +--- |
| 113 | + |
| 114 | +## 3) Architecture & Data Flow |
| 115 | + |
| 116 | +- **Public user** visits `/materials/<property>/<tab>` |
| 117 | + - Flask queries `uploads_log` → Jinja renders a table of files |
| 118 | + - For **Drive** rows: show **Preview** and **Download** links directly to Drive |
| 119 | + - For legacy **local CSV/NPY**: map to an SQLite table name and render with `public_view` |
| 120 | +- **Admin**: |
| 121 | + - **Add Drive file** (link/ID) → write row in `uploads_log` |
| 122 | + - **Link a Drive folder** → enumerate files and upsert rows |
| 123 | + - **ZIP upload → Drive** → (optional) expand in memory, upload each file to Drive, upsert rows |
| 124 | + - **Inline edit** → updates source/description |
| 125 | + - Triggers populate `uploads_audit` for the **Admin Dashboard** |
| 126 | +- **Startup**: |
| 127 | + - `ensure_uploads_log_schema()` creates tables + triggers (idempotent). |
| 128 | + - `ensure_uploads_log_columns()` backfills missing columns on old DBs. |
| 129 | + |
| 130 | +See diagram: **[architecture_diagram.png](./architecture_diagram.png)** |
| 131 | + |
| 132 | +--- |
| 133 | + |
| 134 | +## 4) Environment & Secrets |
| 135 | + |
| 136 | +Set on Fly (examples): |
| 137 | +- `GDRIVE_SA_JSON_BASE64` — base64 of the service account JSON |
| 138 | +- `GDRIVE_ROOT_FOLDER_ID` — ID of the shared folder where the app writes `<property>/<tab>` subfolders |
| 139 | + |
| 140 | +Check with: |
| 141 | +```bash |
| 142 | +fly secrets list -a <app-name> |
| 143 | +``` |
| 144 | + |
| 145 | +--- |
| 146 | + |
| 147 | +## 5) Local Development |
| 148 | + |
| 149 | +```bash |
| 150 | +python -m venv .venv |
| 151 | +source .venv/bin/activate # (Windows: .venv\Scripts\activate) |
| 152 | +pip install -r requirements.txt |
| 153 | +export FLASK_APP=app.py |
| 154 | +flask run |
| 155 | +``` |
| 156 | + |
| 157 | +To simulate Drive locally, you can keep a small `uploads/` tree and use the “legacy” local CSV/NPY view routes, or configure real Drive secrets in your shell environment. |
| 158 | + |
| 159 | +--- |
| 160 | + |
| 161 | +## 6) Operations Cheat Sheet |
| 162 | + |
| 163 | +- **Health**: open `/healthz` |
| 164 | +- **Admin login**: `/login` → set `session['admin']` |
| 165 | +- **Admin home**: `/admin` |
| 166 | +- **Repair duplicates**: optional endpoint that deletes dupes and re-creates index (if present in your code) |
| 167 | +- **Logs**: `fly logs -a <app>` |
| 168 | +- **Secrets**: `fly secrets set KEY=VALUE -a <app>` |
| 169 | + |
| 170 | +--- |
| 171 | + |
| 172 | +## 7) Troubleshooting Nuggets |
| 173 | + |
| 174 | +- `ON CONFLICT does not match any PRIMARY KEY or UNIQUE` |
| 175 | + → Use the **UPDATE→INSERT** upsert pattern or ensure the unique index on `(property, tab, filename)` exists. |
| 176 | +- `storageQuotaExceeded` for service accounts |
| 177 | + → Use **Shared Drive** with delegated permissions, or link existing files/folders instead of uploading large binaries. |
| 178 | +- 503/BuildError on `url_for('public_view', table=...)` |
| 179 | + → Ensure `table_map.get(filename)` returns a non-empty value before rendering the link. |
| 180 | +- Duplicates in lists |
| 181 | + → Deduplicate in SQL and/or by using `DISTINCT`/proper unique index; also avoid adding local rows for Drive-backed items. |
| 182 | + |
| 183 | +--- |
| 184 | + |
| 185 | +## 8) Future Work |
| 186 | + |
| 187 | +- OAuth user uploads (end-user Drive). |
| 188 | +- Server-side previews for large CSVs (streamed, chunked). |
| 189 | +- Role-based admin, audit export, and better search/filter UI. |
| 190 | +- Background jobs for Drive sync. |
| 191 | + |
| 192 | +--- |
| 193 | + |
| 194 | +## 9) License |
| 195 | + |
| 196 | +TBD — choose MIT/Apache-2.0 or similar if you want broad reuse. |
| 197 | + |
| 198 | +--- |
| 199 | + |
| 200 | +### Appendix: Function Index (quick map) |
| 201 | +- **Drive**: `get_drive_service`, `_drive_extract_id`, `drive_find_or_create_folder`, `drive_ensure_property_tab_folder`, `drive_list_folder_files`, `drive_upload_bytes`, `_drive_urls` |
| 202 | +- **DB**: `ensure_uploads_log_schema`, `ensure_uploads_log_columns`, `dedupe_uploads_log` |
| 203 | +- **Public/Admin**: `/materials/<property>/<tab>`, `/admin`, `/healthz`, optional `/admin/repair_uploads` |
| 204 | +- **Legacy import**: `auto_import_uploads`, `auto_log_material_files` |
| 205 | +- **Utilities**: `file_to_table_name`, `tableize_basename` |
| 206 | + |
| 207 | +--- |
| 208 | + |
| 209 | +> Diagram embedded below; if viewing on GitHub, ensure the image exists alongside the README. |
| 210 | +
|
| 211 | + |
0 commit comments