You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
⚡ Bolt: Fix N+1 database insertions in BatchProcessor (#248)
Replaced individual per-file SQLite `INSERT` statements with a single batched `executemany` block at the end of the `_process_sequential` and `_process_parallel` functions. Added periodic chunked saves to prevent data loss on crash.
Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>
Co-authored-by: thebearwithabite <216692431+thebearwithabite@users.noreply.github.com>
Copy file name to clipboardExpand all lines: .jules/bolt.md
+8Lines changed: 8 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -33,3 +33,11 @@
33
33
## 2025-05-27 - [Bulk SQLite Inserts and Connection Reuse for Tagging]
34
34
**Learning:** Sequential `.execute` calls for `INSERT OR REPLACE` inside nested loops over large arrays (like tags) coupled with opening independent DB connections per method creates a severe N+1 problem. Benchmarks showed replacing it with a single shared connection and `executemany` arrays resulted in an ~2x speedup on typical batch tagging workloads.
35
35
**Action:** Always batch related SQL records using `.executemany()` and pass an optional `db_connection` downstream to nested operations instead of establishing a new database connection every time.
36
+
37
+
## 2025-05-15 - Batched DB Inserts
38
+
**Learning:** Sequential processing loops that insert database records one at a time cause N+1 query bottlenecks and extremely poor disk I/O performance on large batches.
39
+
**Action:** Replace `commit()` inside sequential processing loops with `executemany` that runs a single batched commit when the entire result set is gathered.
40
+
41
+
## 2025-05-15 - Batched DB Inserts vs Crash Recovery
42
+
**Learning:** Fully deferring database saves to the end of a long-running batch job using `executemany` solves the N+1 bottleneck, but introduces a risk of data loss if the process crashes midway.
43
+
**Action:** Use periodic chunked batching (e.g., executing `executemany` every 50 records) inside loops to balance disk I/O performance with incremental crash resilience.
0 commit comments