Skip to content

Commit 619b6c8

Browse files
committed
add batch api
1 parent d4579e3 commit 619b6c8

File tree

9 files changed

+1082
-68
lines changed

9 files changed

+1082
-68
lines changed

README.md

Lines changed: 90 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,7 @@ Uses tree-sitter to split source code at semantic boundaries (functions, classes
1919
- **Rich context**: Scope chain, imports, siblings, entity signatures
2020
- **Contextualized text**: Pre-formatted for embedding models
2121
- **Multi-language**: TypeScript, JavaScript, Python, Rust, Go, Java
22+
- **Batch processing**: Process entire codebases with controlled concurrency
2223
- **Streaming**: Process large files incrementally
2324
- **Effect support**: First-class Effect integration
2425

@@ -143,6 +144,48 @@ for (const file of files) {
143144
}
144145
```
145146

147+
### Batch Processing
148+
149+
Process multiple files concurrently with error handling per file:
150+
151+
```typescript
152+
import { chunkBatch } from 'code-chunk'
153+
154+
const files = [
155+
{ filepath: 'src/user.ts', code: userCode },
156+
{ filepath: 'src/auth.ts', code: authCode },
157+
{ filepath: 'lib/utils.py', code: utilsCode },
158+
]
159+
160+
const results = await chunkBatch(files, {
161+
maxChunkSize: 1500,
162+
concurrency: 10,
163+
onProgress: (done, total, path, success) => {
164+
console.log(`[${done}/${total}] ${path}: ${success ? 'ok' : 'failed'}`)
165+
}
166+
})
167+
168+
for (const result of results) {
169+
if (result.error) {
170+
console.error(`Failed: ${result.filepath}`, result.error)
171+
} else {
172+
await indexChunks(result.filepath, result.chunks)
173+
}
174+
}
175+
```
176+
177+
Stream results as they complete:
178+
179+
```typescript
180+
import { chunkBatchStream } from 'code-chunk'
181+
182+
for await (const result of chunkBatchStream(files, { concurrency: 5 })) {
183+
if (result.chunks) {
184+
await indexChunks(result.filepath, result.chunks)
185+
}
186+
}
187+
```
188+
146189
### Effect Integration
147190

148191
For Effect-based pipelines:
@@ -198,7 +241,43 @@ Effect-native streaming API for composable pipelines.
198241

199242
Create a reusable chunker instance with default options.
200243

201-
**Returns:** `Chunker` with `chunk()` and `stream()` methods
244+
**Returns:** `Chunker` with `chunk()`, `stream()`, `chunkBatch()`, and `chunkBatchStream()` methods
245+
246+
---
247+
248+
### `chunkBatch(files, options?)`
249+
250+
Process multiple files concurrently with per-file error handling.
251+
252+
**Parameters:**
253+
- `files`: Array of `{ filepath, code, options? }`
254+
- `options`: Batch options (extends ChunkOptions with `concurrency` and `onProgress`)
255+
256+
**Returns:** `Promise<BatchResult[]>` where each result has `{ filepath, chunks, error }`
257+
258+
---
259+
260+
### `chunkBatchStream(files, options?)`
261+
262+
Stream batch results as files complete processing.
263+
264+
**Returns:** `AsyncGenerator<BatchResult>`
265+
266+
---
267+
268+
### `chunkBatchEffect(files, options?)`
269+
270+
Effect-native batch processing.
271+
272+
**Returns:** `Effect.Effect<BatchResult[], never>`
273+
274+
---
275+
276+
### `chunkBatchStreamEffect(files, options?)`
277+
278+
Effect-native streaming batch processing.
279+
280+
**Returns:** `Stream.Stream<BatchResult, never>`
202281

203282
---
204283

@@ -218,7 +297,7 @@ Detect programming language from file extension.
218297

219298
---
220299

221-
### Options
300+
### ChunkOptions
222301

223302
| Option | Type | Default | Description |
224303
|--------|------|---------|-------------|
@@ -229,6 +308,15 @@ Detect programming language from file extension.
229308
| `language` | `Language` | auto | Override language detection |
230309
| `overlapLines` | `number` | `10` | Lines from previous chunk to include in `contextualizedText` |
231310

311+
### BatchOptions
312+
313+
Extends `ChunkOptions` with:
314+
315+
| Option | Type | Default | Description |
316+
|--------|------|---------|-------------|
317+
| `concurrency` | `number` | `10` | Maximum files to process concurrently |
318+
| `onProgress` | `function` | - | Callback `(completed, total, filepath, success) => void` |
319+
232320
---
233321

234322
### Supported Languages

packages/code-chunk/README.md

Lines changed: 90 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,7 @@ Uses tree-sitter to split source code at semantic boundaries (functions, classes
1919
- **Rich context**: Scope chain, imports, siblings, entity signatures
2020
- **Contextualized text**: Pre-formatted for embedding models
2121
- **Multi-language**: TypeScript, JavaScript, Python, Rust, Go, Java
22+
- **Batch processing**: Process entire codebases with controlled concurrency
2223
- **Streaming**: Process large files incrementally
2324
- **Effect support**: First-class Effect integration
2425

@@ -143,6 +144,48 @@ for (const file of files) {
143144
}
144145
```
145146

147+
### Batch Processing
148+
149+
Process multiple files concurrently with error handling per file:
150+
151+
```typescript
152+
import { chunkBatch } from 'code-chunk'
153+
154+
const files = [
155+
{ filepath: 'src/user.ts', code: userCode },
156+
{ filepath: 'src/auth.ts', code: authCode },
157+
{ filepath: 'lib/utils.py', code: utilsCode },
158+
]
159+
160+
const results = await chunkBatch(files, {
161+
maxChunkSize: 1500,
162+
concurrency: 10,
163+
onProgress: (done, total, path, success) => {
164+
console.log(`[${done}/${total}] ${path}: ${success ? 'ok' : 'failed'}`)
165+
}
166+
})
167+
168+
for (const result of results) {
169+
if (result.error) {
170+
console.error(`Failed: ${result.filepath}`, result.error)
171+
} else {
172+
await indexChunks(result.filepath, result.chunks)
173+
}
174+
}
175+
```
176+
177+
Stream results as they complete:
178+
179+
```typescript
180+
import { chunkBatchStream } from 'code-chunk'
181+
182+
for await (const result of chunkBatchStream(files, { concurrency: 5 })) {
183+
if (result.chunks) {
184+
await indexChunks(result.filepath, result.chunks)
185+
}
186+
}
187+
```
188+
146189
### Effect Integration
147190

148191
For Effect-based pipelines:
@@ -198,7 +241,43 @@ Effect-native streaming API for composable pipelines.
198241

199242
Create a reusable chunker instance with default options.
200243

201-
**Returns:** `Chunker` with `chunk()` and `stream()` methods
244+
**Returns:** `Chunker` with `chunk()`, `stream()`, `chunkBatch()`, and `chunkBatchStream()` methods
245+
246+
---
247+
248+
### `chunkBatch(files, options?)`
249+
250+
Process multiple files concurrently with per-file error handling.
251+
252+
**Parameters:**
253+
- `files`: Array of `{ filepath, code, options? }`
254+
- `options`: Batch options (extends ChunkOptions with `concurrency` and `onProgress`)
255+
256+
**Returns:** `Promise<BatchResult[]>` where each result has `{ filepath, chunks, error }`
257+
258+
---
259+
260+
### `chunkBatchStream(files, options?)`
261+
262+
Stream batch results as files complete processing.
263+
264+
**Returns:** `AsyncGenerator<BatchResult>`
265+
266+
---
267+
268+
### `chunkBatchEffect(files, options?)`
269+
270+
Effect-native batch processing.
271+
272+
**Returns:** `Effect.Effect<BatchResult[], never>`
273+
274+
---
275+
276+
### `chunkBatchStreamEffect(files, options?)`
277+
278+
Effect-native streaming batch processing.
279+
280+
**Returns:** `Stream.Stream<BatchResult, never>`
202281

203282
---
204283

@@ -218,7 +297,7 @@ Detect programming language from file extension.
218297

219298
---
220299

221-
### Options
300+
### ChunkOptions
222301

223302
| Option | Type | Default | Description |
224303
|--------|------|---------|-------------|
@@ -229,6 +308,15 @@ Detect programming language from file extension.
229308
| `language` | `Language` | auto | Override language detection |
230309
| `overlapLines` | `number` | `10` | Lines from previous chunk to include in `contextualizedText` |
231310

311+
### BatchOptions
312+
313+
Extends `ChunkOptions` with:
314+
315+
| Option | Type | Default | Description |
316+
|--------|------|---------|-------------|
317+
| `concurrency` | `number` | `10` | Maximum files to process concurrently |
318+
| `onProgress` | `function` | - | Callback `(completed, total, filepath, success) => void` |
319+
232320
---
233321

234322
### Supported Languages

0 commit comments

Comments
 (0)