Skip to content

Commit 9e9c757

Browse files
Create README.md
1 parent a9bf39b commit 9e9c757

File tree

1 file changed

+211
-0
lines changed

1 file changed

+211
-0
lines changed

README.md

Lines changed: 211 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,211 @@
1+
2+
# Patterns Matter — Materials Database (Flask + Google Drive + SQLite) 🧪
3+
4+
## 1) Overview (Implementation Perspective)
5+
6+
**Patterns Matter** is a lightweight materials database web app built with **Flask** and **Jinja**, deployed on **Fly.io**.
7+
It renders datasets and result artifacts *directly from Google Drive* (service account), while keeping a small **SQLite** catalog (`uploads_log`) and an **audit trail** (`uploads_audit`) on a persistent Fly volume mounted at `/data`. No large files live in the VM—Drive is the source of truth for files.
8+
9+
### High-level
10+
- **Frontend:** HTML + Jinja templates + a thin CSS theme.
11+
- **Backend:** Flask (Gunicorn worker), SQLite for metadata, Google Drive API for file storage, and a few background-style “run once” tasks guarded by a lock.
12+
- **Persistence:** A Fly volume mounted at **`/data`** holds the SQLite DB and small assets (e.g., music clips and logs). Large datasets/results are kept on **Google Drive**.
13+
- **Auth:** Simple admin session with password (env-configurable in app code). Public routes are read-only.
14+
- **Health checks:** `/healthz` responds `200 OK` for Fly health probes.
15+
- **Secrets:** `GDRIVE_SA_JSON_BASE64` (or `GDRIVE_SA_JSON`) and `GDRIVE_ROOT_FOLDER_ID` are stored as **Fly secrets**.
16+
17+
### What changed during the migration
18+
- Was originally **Git-backed + local uploads folder**; now the app uses **Google Drive** as the primary data store for *materials datasets/results*.
19+
- The SQLite catalog (`uploads_log`) is **metadata only** (filename, Drive IDs, preview/download URLs, descriptions).
20+
- Admin **uploads** (ZIP or folder link) write files to Drive and **upsert** metadata into the catalog.
21+
- Public **views** read from Drive (or from local DB tables for legacy CSV/NPY still shipped in the repo).
22+
23+
---
24+
25+
## 2) Key Takeaways (Learner Perspective)
26+
27+
You touched almost every layer of a modern web service. Here’s a recap by technology and the concrete techniques used.
28+
29+
### HTML / CSS / Jinja (Templating)
30+
- **HTML structure**: semantic tags (`<table>`, `<form>`, `<input>`, `<a>`, `<button>`) and responsive wrappers.
31+
- **CSS**: simple layout, table styling, “pills”, hover states, small screen tweaks (`@media`).
32+
- **Jinja**:
33+
- Control flow and filters: `{% if %}`, `{% for %}`, `|title`, `|length`, `|safe`.
34+
- Template variables from Flask: `render_template("view.html", uploads=uploads, admin=is_admin, ...)`.
35+
- URL building: `url_for('property_detail', property_name=p, tab='results')` – avoids hardcoding links.
36+
- Defensive rendering: `row.get('storage', 'local')`, `COALESCE()` at SQL-layer to keep templates robust.
37+
38+
**Mental Model:** Jinja renders **server-side**. You pass Python objects, Jinja turns them into HTML. Keep logic light in templates; do data prep in views.
39+
40+
---
41+
42+
### Flask (Backend)
43+
- **Routes**: `@app.route('/materials/<property_name>/<tab>', methods=['GET', 'POST'])` with guards and branching:
44+
- Admin POST handlers: add Drive file, link Drive folder (recursive listing), ZIP upload → Drive, inline edits.
45+
- Public GET: query `uploads_log`, compute `table_map` for legacy local CSV/NPY tables.
46+
- **Helpers**:
47+
- `get_drive_service()` (lazy import, env-driven),
48+
- `_drive_extract_id()`, `drive_list_folder_files(..., recursive=True)`,
49+
- `drive_ensure_property_tab_folder()` (ensures `<root>/<property>/<tab>`),
50+
- `drive_upload_bytes()`, `_drive_urls()`,
51+
- `file_to_table_name()` (canonicalizes filenames to SQLite-safe table names).
52+
- **Sessions**: `session['admin'] = True` on login; guards like `if not session.get('admin')` for admin routes.
53+
- **Rendering**: `render_template()`, `redirect()`, `url_for()`, `jsonify()`.
54+
- **Startup safety**: a run-once initialization guarded by a **`Lock`** and a module-level boolean to avoid duplicate schema work:
55+
- `ensure_uploads_log_schema()` ensures tables & triggers.
56+
- `ensure_uploads_log_columns()` backfills columns on older DBs.
57+
- **Error handling**: wrap critical steps in `try/except` and log warnings instead of crashing public pages.
58+
59+
**How to build a route (minimum):**
60+
```python
61+
@app.route("/path", methods=["GET","POST"])
62+
def view_func():
63+
# 1) read from request (form/args/session)
64+
# 2) domain work (query/drive/api)
65+
# 3) render_template(...) or redirect/url_for(...)
66+
```
67+
68+
---
69+
70+
### SQLite (SQL, Indices, Triggers)
71+
- **Schema**:
72+
- `uploads_log(property, tab, filename, uploaded_at, storage, drive_id, preview_url, download_url, source, description)`
73+
- `uploads_audit(property, tab, filename, action, at)`
74+
- **Triggers** keep a **history** automatically:
75+
- On INSERT → action `add`
76+
- On UPDATE → action `update`
77+
- On DELETE → action `delete`
78+
- **Upserts** evolved for compatibility: safe `UPDATE...; if rowcount==0 INSERT...` pattern to avoid `ON CONFLICT` errors on older SQLite or missing indices.
79+
- **Dedupe tools**: optional admin endpoint to delete duplicates and re-create a unique index on `(property, tab, filename)`.
80+
- **Queries**: COALESCE defensive selection; pagination unnecessary due to table size but can be added.
81+
82+
**Mental Model:** Treat SQLite as a **metadata ledger**. Big binaries live elsewhere (Drive).
83+
84+
---
85+
86+
### Google Drive API (Service Account)
87+
- **Service account** credentials come from `GDRIVE_SA_JSON_BASE64` *(or `GDRIVE_SA_JSON` path)*.
88+
- **Root folder** for the app is `GDRIVE_ROOT_FOLDER_ID`. Under it the app creates/uses the hierarchy:
89+
- `<root>/<property>/<tab>` where `tab ∈ {dataset, results}`.
90+
- **Listing**: `drive_list_folder_files(..., recursive=True)` walks folders; filters by allowed extensions per tab.
91+
- **Upload**: `drive_upload_bytes(service, folder_id, filename, bytes)`.
92+
- **Sharing**: Best-effort `permissions().create(fileId, body={"role": "reader", "type": "anyone"})` for public links.
93+
- **URLs**: preview and download links are computed and stored in the catalog.
94+
95+
**Mental Model:** Drive is your **blob store**. SQLite stores **pointers and descriptive metadata**.
96+
97+
---
98+
99+
### Containers & Dockerfile (Deployment Mechanics)
100+
- The app runs behind **Gunicorn** in a **Firecracker VM** on Fly.
101+
- **Dockerfile** defines your Python environment (packages like `google-api-python-client`, `pandas`, etc.).
102+
- The container exposes port **8080**; Fly proxies 80/443 to it.
103+
- **Health check**: `/healthz` must be very fast and never fail.
104+
- **Volume**: `/data` mount stores the SQLite DB and small local assets. Fly restarts won’t erase your DB.
105+
- **fly.toml** highlights:
106+
- `internal_port = 8080`
107+
- `[[http_service.checks]] path="/healthz"`
108+
- `[[mounts]] destination = "/data"`
109+
110+
**Mental Model:** Image (code + deps) → Machine (VM) → Volume for persistence → Health checks to keep it alive.
111+
112+
---
113+
114+
## 3) Architecture & Data Flow
115+
116+
- **Public user** visits `/materials/<property>/<tab>`
117+
- Flask queries `uploads_log` → Jinja renders a table of files
118+
- For **Drive** rows: show **Preview** and **Download** links directly to Drive
119+
- For legacy **local CSV/NPY**: map to an SQLite table name and render with `public_view`
120+
- **Admin**:
121+
- **Add Drive file** (link/ID) → write row in `uploads_log`
122+
- **Link a Drive folder** → enumerate files and upsert rows
123+
- **ZIP upload → Drive** → (optional) expand in memory, upload each file to Drive, upsert rows
124+
- **Inline edit** → updates source/description
125+
- Triggers populate `uploads_audit` for the **Admin Dashboard**
126+
- **Startup**:
127+
- `ensure_uploads_log_schema()` creates tables + triggers (idempotent).
128+
- `ensure_uploads_log_columns()` backfills missing columns on old DBs.
129+
130+
See diagram: **[architecture_diagram.png](./architecture_diagram.png)**
131+
132+
---
133+
134+
## 4) Environment & Secrets
135+
136+
Set on Fly (examples):
137+
- `GDRIVE_SA_JSON_BASE64` — base64 of the service account JSON
138+
- `GDRIVE_ROOT_FOLDER_ID` — ID of the shared folder where the app writes `<property>/<tab>` subfolders
139+
140+
Check with:
141+
```bash
142+
fly secrets list -a <app-name>
143+
```
144+
145+
---
146+
147+
## 5) Local Development
148+
149+
```bash
150+
python -m venv .venv
151+
source .venv/bin/activate # (Windows: .venv\Scripts\activate)
152+
pip install -r requirements.txt
153+
export FLASK_APP=app.py
154+
flask run
155+
```
156+
157+
To simulate Drive locally, you can keep a small `uploads/` tree and use the “legacy” local CSV/NPY view routes, or configure real Drive secrets in your shell environment.
158+
159+
---
160+
161+
## 6) Operations Cheat Sheet
162+
163+
- **Health**: open `/healthz`
164+
- **Admin login**: `/login` → set `session['admin']`
165+
- **Admin home**: `/admin`
166+
- **Repair duplicates**: optional endpoint that deletes dupes and re-creates index (if present in your code)
167+
- **Logs**: `fly logs -a <app>`
168+
- **Secrets**: `fly secrets set KEY=VALUE -a <app>`
169+
170+
---
171+
172+
## 7) Troubleshooting Nuggets
173+
174+
- `ON CONFLICT does not match any PRIMARY KEY or UNIQUE`
175+
→ Use the **UPDATE→INSERT** upsert pattern or ensure the unique index on `(property, tab, filename)` exists.
176+
- `storageQuotaExceeded` for service accounts
177+
→ Use **Shared Drive** with delegated permissions, or link existing files/folders instead of uploading large binaries.
178+
- 503/BuildError on `url_for('public_view', table=...)`
179+
→ Ensure `table_map.get(filename)` returns a non-empty value before rendering the link.
180+
- Duplicates in lists
181+
→ Deduplicate in SQL and/or by using `DISTINCT`/proper unique index; also avoid adding local rows for Drive-backed items.
182+
183+
---
184+
185+
## 8) Future Work
186+
187+
- OAuth user uploads (end-user Drive).
188+
- Server-side previews for large CSVs (streamed, chunked).
189+
- Role-based admin, audit export, and better search/filter UI.
190+
- Background jobs for Drive sync.
191+
192+
---
193+
194+
## 9) License
195+
196+
TBD — choose MIT/Apache-2.0 or similar if you want broad reuse.
197+
198+
---
199+
200+
### Appendix: Function Index (quick map)
201+
- **Drive**: `get_drive_service`, `_drive_extract_id`, `drive_find_or_create_folder`, `drive_ensure_property_tab_folder`, `drive_list_folder_files`, `drive_upload_bytes`, `_drive_urls`
202+
- **DB**: `ensure_uploads_log_schema`, `ensure_uploads_log_columns`, `dedupe_uploads_log`
203+
- **Public/Admin**: `/materials/<property>/<tab>`, `/admin`, `/healthz`, optional `/admin/repair_uploads`
204+
- **Legacy import**: `auto_import_uploads`, `auto_log_material_files`
205+
- **Utilities**: `file_to_table_name`, `tableize_basename`
206+
207+
---
208+
209+
> Diagram embedded below; if viewing on GitHub, ensure the image exists alongside the README.
210+
211+
![Architecture Diagram](architecture_diagram.png)

0 commit comments

Comments
 (0)