Skip to content

Commit 6dcb9ca

Browse files
committed
feat: add turbo mode for file workers
- Added worker().turbo() for parallel array processing with external files - Workers can use require() for DB, Redis, external modules - Array split into chunks, processed in parallel, merged in order - Type-safe API with TypeScript generics - Updated README.md with file worker turbo documentation - Updated DOCS.md with detailed technical documentation - Added 6 new unit tests for worker().turbo() - 343 tests passing
1 parent b6abe4b commit 6dcb9ca

File tree

15 files changed

+1033
-48
lines changed

15 files changed

+1033
-48
lines changed

DOCS.md

Lines changed: 152 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -11,14 +11,15 @@
1111
1. [What is bee-threads?](#what-is-bee-threads)
1212
2. [Architecture Overview](#architecture-overview)
1313
3. [File-by-File Breakdown](#file-by-file-breakdown)
14-
4. [Technical Decisions](#technical-decisions)
15-
5. [Security Architecture](#security-architecture)
16-
6. [Performance Architecture](#performance-architecture)
17-
7. [Data Flow](#data-flow)
18-
8. [Error Handling](#error-handling)
19-
9. [Memory Management](#memory-management)
20-
10. [Runtime & Bundler Compatibility](#runtime--bundler-compatibility)
21-
11. [Contributing Guide](#contributing-guide)
14+
4. [File Workers](#file-workers)
15+
5. [Technical Decisions](#technical-decisions)
16+
6. [Security Architecture](#security-architecture)
17+
7. [Performance Architecture](#performance-architecture)
18+
8. [Data Flow](#data-flow)
19+
9. [Error Handling](#error-handling)
20+
10. [Memory Management](#memory-management)
21+
11. [Runtime & Bundler Compatibility](#runtime--bundler-compatibility)
22+
12. [Contributing Guide](#contributing-guide)
2223

2324
---
2425

@@ -832,6 +833,149 @@ function reconstructBuffers(value: unknown): unknown {
832833

833834
---
834835

836+
## File Workers
837+
838+
File workers allow running external worker files with full `require()` access. This is essential when workers need to access:
839+
840+
- **Database connections** (PostgreSQL, MongoDB, Redis)
841+
- **External modules** (sharp, bcrypt, custom libraries)
842+
- **File system** (fs, path)
843+
- **Environment variables** and configuration
844+
845+
### Architecture
846+
847+
```
848+
┌─────────────────────────────────────────────────────────────────────┐
849+
│ beeThreads.worker(path) │
850+
└─────────────────────────────────────────────────────────────────────┘
851+
852+
853+
┌─────────────────────────────────────────────────────────────────────┐
854+
│ file-worker.ts │
855+
│ • Worker pool per file path │
856+
│ • Auto-scaling (up to cpus - 1) │
857+
│ • Worker reuse across calls │
858+
└─────────────────────────────────────────────────────────────────────┘
859+
860+
┌───────────────────────┼───────────────────────┐
861+
▼ ▼ ▼
862+
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
863+
│ Worker 1 │ │ Worker 2 │ │ Worker 3 │
864+
│ require() │ │ require() │ │ require() │
865+
│ db conn │ │ db conn │ │ db conn │
866+
└─────────────┘ └─────────────┘ └─────────────┘
867+
```
868+
869+
### `src/file-worker.ts` - File Worker Implementation
870+
871+
**Why it exists:**
872+
Enables workers to use `require()` for external dependencies - impossible with inline workers.
873+
874+
**What it does:**
875+
876+
- Creates worker pools per file path
877+
- Manages worker lifecycle and reuse
878+
- Supports both single execution and turbo mode
879+
- Type-safe generic API
880+
881+
**Key exports:**
882+
883+
| Export | Description |
884+
|--------|-------------|
885+
| `createFileWorker<T>` | Creates type-safe file worker executor |
886+
| `terminateFileWorkers` | Terminates all file workers |
887+
| `FileWorkerExecutor<T>` | Interface with call and turbo methods |
888+
889+
### Single Execution
890+
891+
```js
892+
// workers/process-user.js
893+
const db = require('./database')
894+
module.exports = async function(userId) {
895+
return db.findUser(userId)
896+
}
897+
898+
// main.js
899+
const user = await beeThreads.worker('./workers/process-user.js')(123)
900+
```
901+
902+
### Turbo Mode - Parallel Array Processing
903+
904+
When you have a large array and need to process it with database access:
905+
906+
```js
907+
// workers/process-chunk.js
908+
const db = require('./database')
909+
const cache = require('./cache')
910+
911+
module.exports = async function(users) {
912+
return Promise.all(users.map(async user => ({
913+
...user,
914+
score: await db.getScore(user.id),
915+
cached: cache.get(user.id)
916+
})))
917+
}
918+
919+
// main.js - Process 10,000 users across 8 workers
920+
const results = await beeThreads
921+
.worker('./workers/process-chunk.js')
922+
.turbo(users, { workers: 8 })
923+
```
924+
925+
**How turbo mode works:**
926+
927+
1. **Split**: Array divided into N chunks (one per worker)
928+
2. **Execute**: Each worker processes its chunk in parallel
929+
3. **Merge**: Results combined in original order
930+
931+
```
932+
[u1, u2, u3, u4, u5, u6, u7, u8] → workers: 4
933+
934+
Worker 1: [u1, u2] → [r1, r2]
935+
Worker 2: [u3, u4] → [r3, r4]
936+
Worker 3: [u5, u6] → [r5, r6]
937+
Worker 4: [u7, u8] → [r7, r8]
938+
939+
[r1, r2, r3, r4, r5, r6, r7, r8] (order preserved)
940+
```
941+
942+
### Type Safety (TypeScript)
943+
944+
```ts
945+
// workers/find-user.ts
946+
import { db } from '../database'
947+
export default async function(id: number): Promise<User> {
948+
return db.query('SELECT * FROM users WHERE id = ?', [id])
949+
}
950+
951+
// main.ts - Full type inference
952+
import type findUser from './workers/find-user'
953+
954+
const user = await beeThreads.worker<typeof findUser>('./workers/find-user')(123)
955+
// ^User ^number
956+
```
957+
958+
### Worker Pool Behavior
959+
960+
| Aspect | Behavior |
961+
|--------|----------|
962+
| Pool size | Up to `cpus - 1` per file |
963+
| Worker reuse | Yes, across all calls |
964+
| Idle cleanup | Not automatic (call `shutdown`) |
965+
| Error isolation | Errors don't crash other workers |
966+
967+
### When to Use File Workers vs Inline
968+
969+
| Scenario | Use |
970+
|----------|-----|
971+
| Pure computation | `bee()` or `turbo()` |
972+
| Need `require()` | `worker()` |
973+
| Database access | `worker()` |
974+
| Large array + DB | `worker().turbo()` |
975+
| Single item + DB | `worker()` (single call) |
976+
977+
---
978+
835979
## Technical Decisions
836980

837981
### 1. Why vm.Script instead of eval()?

README.md

Lines changed: 124 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -342,6 +342,103 @@ const { data, stats } = await beeThreads.turbo(arr).mapWithStats(x => x * x)
342342
console.log(stats.speedupRatio) // "7.2x"
343343
```
344344

345+
## 🚀 File Workers - External Files with `require()` Access
346+
347+
When you need workers to access **external modules**, **database connections**, or **file system** — use file workers.
348+
349+
### Basic Usage
350+
351+
```js
352+
// workers/process-user.js
353+
const db = require('./database')
354+
const cache = require('./cache')
355+
356+
module.exports = async function (userId) {
357+
const user = await db.findUser(userId)
358+
return { ...user, cached: cache.get(userId) }
359+
}
360+
361+
// main.js
362+
const { beeThreads } = require('bee-threads')
363+
364+
const user = await beeThreads.worker('./workers/process-user.js')(123)
365+
```
366+
367+
### Type-Safe Workers (TypeScript)
368+
369+
```ts
370+
// workers/find-user.ts
371+
import { db } from '../database'
372+
export default async function (id: number): Promise<User> {
373+
return db.query('SELECT * FROM users WHERE id = ?', [id])
374+
}
375+
376+
// main.ts
377+
import type findUser from './workers/find-user'
378+
const user = await beeThreads.worker<typeof findUser>('./workers/find-user')(123)
379+
// ^User ^number
380+
```
381+
382+
### 🔥 Turbo Mode for File Workers
383+
384+
Process large arrays with file workers across **ALL CPU cores**:
385+
386+
```
387+
┌──────────────────────────────────────────────────────────────────────┐
388+
│ beeThreads.worker('./process-chunk.js').turbo(users, { workers: 4 })│
389+
└──────────────────────────────────────────────────────────────────────┘
390+
391+
┌──────────┴──────────┐
392+
│ SPLIT INTO CHUNKS │
393+
└──────────┬──────────┘
394+
395+
┌─────────────────────┼─────────────────────┐
396+
▼ ▼ ▼
397+
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
398+
│ Worker 1 │ │ Worker 2 │ │ Worker 3 │
399+
│ [u1,u2,u3] │ │ [u4,u5,u6] │ │ [u7,u8,u9] │
400+
│ require DB │ │ require DB │ │ require DB │
401+
└─────────────┘ └─────────────┘ └─────────────┘
402+
│ │ │
403+
└─────────────────────┼─────────────────────┘
404+
405+
┌──────────────────────┐
406+
│ MERGE (order kept) │
407+
│ [r1,r2...r9] │
408+
└──────────────────────┘
409+
```
410+
411+
```js
412+
// workers/process-chunk.js
413+
const db = require('./database')
414+
const calculateScore = require('./score')
415+
416+
module.exports = async function (users) {
417+
return Promise.all(
418+
users.map(async user => ({
419+
...user,
420+
score: await calculateScore(user),
421+
dbData: await db.fetch(user.id),
422+
}))
423+
)
424+
}
425+
426+
// main.js - Process 10,000 users across 8 workers
427+
const results = await beeThreads.worker('./workers/process-chunk.js').turbo(users, { workers: 8 })
428+
```
429+
430+
### When to Use
431+
432+
| Need | Use |
433+
|------|-----|
434+
| Pure computation | `bee()` or `turbo()` |
435+
| Database/Redis | `worker().turbo()` |
436+
| External files/modules | `worker().turbo()` |
437+
| File system operations | `worker().turbo()` |
438+
| Third-party libraries | `worker().turbo()` |
439+
440+
---
441+
345442
## Request Coalescing
346443

347444
Prevents duplicate simultaneous calls from running multiple times. When the same function with identical arguments is called while a previous call is in-flight, subsequent calls share the same Promise.
@@ -497,6 +594,9 @@ const stream = beeThreads
497594
- **Large array processing** (turbo mode)
498595
- **Matrix operations** (turbo mode)
499596
- **Numerical simulations** (turbo mode)
597+
- **Database batch operations** (file worker turbo)
598+
- **ETL pipelines** (file worker turbo)
599+
- **API aggregation** (file worker turbo)
500600
- Data pipelines
501601
- Video/image encoding services
502602
- Scientific computing
@@ -514,33 +614,47 @@ node benchmarks.js # Node
514614

515615
### Results (1M items, heavy function, 12 CPUs, 10 runs avg)
516616

517-
**Bun** - Real parallel speedup:
617+
**Bun (Windows)**
518618

519619
| Mode | Time (±std) | Speedup | Main Thread |
520620
|------|-------------|---------|-------------|
521621
| main | 285±5ms | 1.00x | ❌ blocked |
522622
| bee | 1138±51ms | 0.25x | ✅ free |
523-
| turbo(4) | 255±7ms | 1.12x | ✅ free |
524-
| turbo(8) | 180±8ms | **1.58x** | ✅ free |
623+
| turbo(8) | 180±8ms | 1.58x | ✅ free |
525624
| **turbo(12)** | **156±12ms** | **1.83x** | ✅ free |
526-
| turbo(16) | 204±28ms | 1.40x | ✅ free |
527625

528-
**Node** - Non-blocking I/O (slower, but frees main thread):
626+
**Bun (Linux/Docker)**
627+
628+
| Mode | Time (±std) | Speedup | Main Thread |
629+
|------|-------------|---------|-------------|
630+
| main | 338±8ms | 1.00x | ❌ blocked |
631+
| bee | 1882±64ms | 0.18x | ✅ free |
632+
| turbo(8) | 226±7ms | 1.50x | ✅ free |
633+
| **turbo(12)** | **213±20ms** | **1.59x** | ✅ free |
634+
635+
**Node (Windows)**
529636

530637
| Mode | Time (±std) | Speedup | Main Thread |
531638
|------|-------------|---------|-------------|
532639
| main | 368±13ms | 1.00x | ❌ blocked |
533640
| bee | 5569±203ms | 0.07x | ✅ free |
534-
| turbo(4) | 1793±85ms | 0.21x | ✅ free |
535641
| turbo(8) | 1052±22ms | 0.35x | ✅ free |
536642
| **turbo(12)** | **1017±57ms** | **0.36x** | ✅ free |
537-
| turbo(16) | 1099±98ms | 0.34x | ✅ free |
643+
644+
**Node (Linux/Docker)**
645+
646+
| Mode | Time (±std) | Speedup | Main Thread |
647+
|------|-------------|---------|-------------|
648+
| main | 522±54ms | 1.00x | ❌ blocked |
649+
| bee | 5520±163ms | 0.09x | ✅ free |
650+
| turbo(8) | 953±44ms | 0.55x | ✅ free |
651+
| **turbo(12)** | **861±64ms** | **0.61x** | ✅ free |
538652

539653
### Key Insights
540654

541-
- **Bun + turbo(cpus)**: Up to **1.83x faster** than main thread
655+
- **Bun + turbo**: **1.6-1.8x faster** than main thread (both OS)
656+
- **Node + Linux**: **0.61x** - much better than Windows (0.36x)
542657
- **bee/turbo**: Non-blocking - main thread stays **free for HTTP/I/O**
543-
- **Node + turbo**: Slower, but useful for keeping servers responsive
544658
- **bee vs turbo**: turbo is **7x faster** than bee for large arrays
545659
- **Default workers**: `cpus - 1` (safe for all systems)
546660

@@ -574,6 +688,7 @@ await beeThreads.turbo(data, { workers: 12 }).map(fn)
574688
- **Worker affinity** - Same function → same worker (V8 JIT)
575689
- **Request coalescing** - Deduplicates identical calls
576690
- **Turbo mode** - Parallel array processing (workers only)
691+
- **File workers** - External files with `require()` + turbo mode
577692
- **Full TypeScript** - Complete type definitions
578693

579694
---

0 commit comments

Comments
 (0)