Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
37 changes: 37 additions & 0 deletions .changeset/fix-github-enobufs.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
---
"@lytics/dev-agent": patch
"@lytics/dev-agent-subagents": patch
"@lytics/dev-agent-cli": patch
---

Fix ENOBUFS error during GitHub issues/PRs indexing for large repositories

**Problem:** When indexing repositories with many GitHub issues/PRs (especially with large issue bodies), the `dev index` command would fail with `ENOBUFS` (No buffer space available) error.

**Solution:**
- Increased execSync maxBuffer from default 1MB to 50MB for issue/PR fetching
- Reduced default fetch limit from 1000 to 500 items to prevent buffer overflow
- Added `--gh-limit` CLI flag to allow users to customize the limit
- Improved error messages to guide users when buffer issues occur

**Changes:**
- `fetchIssues()` and `fetchPullRequests()` now use 50MB maxBuffer
- Default limit changed from 1000 to 500 (per type: issues and PRs)
- Added `--gh-limit <number>` flag to `dev index` command
- Better error handling with helpful suggestions (use `--gh-limit 100` for very large repos)
- Comprehensive test coverage (23 new tests for fetcher utilities)

**Usage:**
```bash
# Default (works for most repos)
dev index

# For large repos (200+ issues/PRs)
dev index --gh-limit 200

# For very active repos (500+ issues/PRs)
dev index --gh-limit 100
```

**Testing:** All 1100+ tests passing. Verified on lytics-ui repository (6989 files, 1000 issues/PRs indexed successfully).

39 changes: 39 additions & 0 deletions TROUBLESHOOTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -369,6 +369,45 @@ dev index .
3. **No issues/PRs:**
Repository might not have any issues/PRs yet.

### ENOBUFS error during GitHub indexing

**Error message:**
```
Failed to fetch issues: spawnSync /bin/sh ENOBUFS
```

**Cause:** Buffer overflow when fetching large numbers of issues/PRs from repositories with extensive GitHub activity.

**Solutions:**

1. **Use lower limit (recommended):**
```bash
# For main index command
dev index --gh-limit 100

# For dedicated GitHub indexing
dev gh index --limit 100
```

2. **Adjust limit based on repository size:**
- Small repos (<50 issues/PRs): Default (500) works fine
- Medium repos (50-200 issues/PRs): Use `--gh-limit 200`
- Large repos (200+ issues/PRs): Use `--gh-limit 100` or lower

3. **Index in batches:**
```bash
# Index open items only (usually smaller)
dev gh index --state open --limit 500

# Then index closed items with lower limit
dev gh index --state closed --limit 100
```

**Technical details:**
- Default limit reduced to 500 (from 1000) to prevent buffer overflow
- Buffer size increased to 50MB for large payloads
- Helpful error messages now guide users to use `--gh-limit` flag

### `dev_gh` tool not finding issues

**Diagnosis:**
Expand Down
94 changes: 65 additions & 29 deletions docs/WORKFLOW.md
Original file line number Diff line number Diff line change
Expand Up @@ -182,51 +182,87 @@ Issue: #<number>

## PR Description Format

### Principles

**Keep it concise and meaningful** - Context is important, but excessive noise makes PRs harder to parse. Focus on essential information that helps reviewers and provides future reference.

### Structure

```markdown
## Summary
Brief overview of what this PR does.
1-2 sentence overview of what this PR does and why.

## Features
✅ Feature 1
✅ Feature 2
✅ Feature 3
## Problem (if fix)
Brief description of the bug/issue being fixed.

## Testing
- ✅ X tests, all passing
- ✅ Y% statement coverage (Z% function coverage)
- ✅ Tested: scenarios covered
- ⚠️ Uncovered: what's not covered and why
## Solution
- Key change 1
- Key change 2
- Key change 3

## Performance
- Metric 1: value
- Metric 2: value
## Usage (if new feature)
```bash
# Example command or code snippet
```

## Documentation
- ✅ README with examples
- ✅ API reference
- ✅ Usage guide
## Testing
- ✅ X tests passing (Y new tests)
- ✅ Verified on: specific scenario/repository
- ⚠️ Known limitations (if any)

## Example Usage
```typescript
// Clear, runnable example
## Changes
- N commits: brief description of commit types
- Packages affected: list relevant packages
```
\`\`\`

## Coverage Report
\`\`\`
Coverage table
### Good Example

```markdown
## Summary
Fixes ENOBUFS error when indexing repositories with many GitHub issues/PRs.

## Problem
\`dev index\` would fail with \`ENOBUFS\` on repositories with extensive GitHub
activity due to buffer overflow (default 1MB buffer, fetching 1000+ items).

## Solution
- Increased maxBuffer: 1MB → 50MB for issue/PR fetching
- Lowered default limit: 1000 → 500 items per type
- Added \`--gh-limit <number>\` CLI flag for customization
- Improved error messages with actionable suggestions

## Usage
\`\`\`bash
dev index # Default (500 items)
dev index --gh-limit 200 # Large repos
dev index --gh-limit 100 # Very active repos
\`\`\`

## Known Limitations
- ⚠️ Limitation 1
- ⚠️ Limitation 2
## Testing
- ✅ All 1100+ tests passing
- ✅ 23 new fetcher utility tests
- ✅ Verified on 6,989 file repo with 1,000 issues/PRs

## Closes
Closes #<issue-number>
## Changes
- 6 commits: fix implementation, tests, documentation, changeset, website
- Patches: \`@lytics/dev-agent\`, \`@lytics/dev-agent-cli\`, \`@lytics/dev-agent-subagents\`
```

### What to Exclude

**Don't include:**
- ❌ Verbose change logs (commits already document this)
- ❌ Line-by-line code explanations
- ❌ Coverage tables (CI provides this)
- ❌ Full test lists (test files document this)
- ❌ Obvious information that's in the code

**Instead:**
- ✅ Focus on the "why" and key decisions
- ✅ Usage examples for new features
- ✅ Verification details for bug fixes
- ✅ Brief overview of what changed

## Testing Standards

### Coverage Goals
Expand Down
9 changes: 9 additions & 0 deletions packages/cli/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,15 @@ dev index .
Options:
- `-f, --force` - Force re-index even if unchanged
- `-v, --verbose` - Show verbose output
- `--no-git` - Skip git history indexing
- `--no-github` - Skip GitHub issues/PRs indexing
- `--git-limit <number>` - Max git commits to index (default: 500)
- `--gh-limit <number>` - Max GitHub issues/PRs to fetch (default: 500)

**GitHub Limit Guidance:**
- Default (500): Works for most repositories
- Large repos (200+ issues/PRs): Use `--gh-limit 100-200` to prevent buffer overflow
- Very active repos: Start with `--gh-limit 50` and increase as needed

### Search

Expand Down
5 changes: 4 additions & 1 deletion packages/cli/src/commands/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,7 @@ export const indexCommand = new Command('index')
.option('--no-git', 'Skip git history indexing')
.option('--no-github', 'Skip GitHub issues/PRs indexing')
.option('--git-limit <number>', 'Max git commits to index (default: 500)', Number.parseInt, 500)
.option('--gh-limit <number>', 'Max GitHub issues/PRs to fetch (default: 500)', Number.parseInt)
.action(async (repositoryPath: string, options) => {
const spinner = ora('Checking prerequisites...').start();

Expand Down Expand Up @@ -207,7 +208,9 @@ export const indexCommand = new Command('index')
});
await ghIndexer.initialize();

ghStats = await ghIndexer.index({});
ghStats = await ghIndexer.index({
limit: options.ghLimit,
});
spinner.succeed(chalk.green('GitHub indexed!'));
logger.log('');
logger.log(chalk.bold('GitHub:'));
Expand Down
52 changes: 50 additions & 2 deletions packages/subagents/src/github/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -350,7 +350,10 @@ pnpm test packages/subagents/src/coordinator/github-coordinator.integration.test

**Coverage:**
- ✅ **Parser utilities:** 100% (47 tests)
- ✅ **Fetcher utilities:** 100% (23 tests)
- ✅ **Indexer:** 100% (9 tests)
- ✅ **Coordinator integration:** 100% (14 tests)
- ✅ **Total:** 79 tests, all passing

## Examples

Expand Down Expand Up @@ -429,11 +432,17 @@ interface GitHubIndexOptions {
includePullRequests?: boolean; // Default: true
includeDiscussions?: boolean; // Default: false
state?: 'open' | 'closed' | 'all'; // Default: 'all'
limit?: number; // Default: 100
limit?: number; // Default: 500 (reduced from 1000 to prevent buffer overflow)
repository?: string; // Default: current repo
}
```

**Limit Recommendations:**
- **Default (500):** Works for most repositories
- **Large repos (200+ issues/PRs):** Use 100-200 to prevent ENOBUFS errors
- **Very active repos (500+ issues/PRs):** Start with 50-100
- **Small repos (<50 issues/PRs):** Can use higher limits (1000+)

## Error Handling

The agent handles errors gracefully and returns structured error responses:
Expand All @@ -453,6 +462,13 @@ The agent handles errors gracefully and returns structured error responses:
code: 'ISSUE_NOT_FOUND',
}

// Buffer overflow (ENOBUFS)
{
action: 'index',
error: 'Failed to fetch issues: Output too large. Try using --gh-limit with a lower value (e.g., --gh-limit 100)',
code: 'BUFFER_OVERFLOW',
}

// Network/API errors
{
action: 'index',
Expand All @@ -462,13 +478,20 @@ The agent handles errors gracefully and returns structured error responses:
}
```

**Buffer Management:**
- Uses 50MB maxBuffer for issue/PR fetching (up from default 1MB)
- Uses 10MB maxBuffer for repository metadata
- Provides helpful error messages suggesting --gh-limit flag on overflow
- Default limit of 500 prevents most buffer issues

## Performance Considerations

### Indexing Performance

- **Time:** ~1-2 seconds per 10 items (depends on API rate limits)
- **Memory:** ~5KB per document (in-memory storage)
- **Recommended batch size:** 100-500 items
- **Recommended batch size:** 500 items (default)
- **Buffer size:** 50MB for large payloads, 10MB for metadata

### Search Performance

Expand All @@ -480,6 +503,12 @@ The agent handles errors gracefully and returns structured error responses:
1. **Incremental indexing:** Only fetch new/updated items
2. **Filtering:** Use `state` and `types` to reduce dataset
3. **Caching:** Store frequently accessed contexts
4. **Batch processing:** For very large repos, index in batches with lower limits
```bash
# Example: Index open items separately
dev gh index --state open --limit 500
dev gh index --state closed --limit 100
```

## Future Enhancements

Expand All @@ -503,6 +532,25 @@ brew install gh # macOS
gh auth login
```

### ENOBUFS error during indexing

**Error:** `Failed to fetch issues: spawnSync /bin/sh ENOBUFS`

**Solution:**
```bash
# Use lower limit
dev gh index --limit 100

# Or for very large repos
dev gh index --limit 50

# Alternative: Index by state separately
dev gh index --state open --limit 500
dev gh index --state closed --limit 100
```

**Cause:** Buffer overflow when fetching many issues/PRs with large bodies. Default limit of 500 works for most repos, but very active repositories may need lower limits.

### No results when searching

1. Check if data is indexed: `dev gh index`
Expand Down
Loading