Skip to content

feat: implement Phase 1 - Core Library/API (v0.7.0-alpha)#5

Merged
1broseidon merged 1 commit intomainfrom
claude/phase-1-core-library-api-011CUxhL2yfFq2TjcFeWc2nv
Nov 9, 2025
Merged

feat: implement Phase 1 - Core Library/API (v0.7.0-alpha)#5
1broseidon merged 1 commit intomainfrom
claude/phase-1-core-library-api-011CUxhL2yfFq2TjcFeWc2nv

Conversation

@1broseidon
Copy link
Owner

This commit implements the foundation for using promptext as a Go library,
transforming it from a CLI-only tool into a developer-friendly API while
maintaining 100% backward compatibility with the existing CLI.

New Public API Package: pkg/promptext

Created a complete public API surface with the following components:

Core API (promptext.go)

  • Extract() - Main entry point for simple extraction
  • Extractor type - Reusable extractor for multiple directories
  • NewExtractor() - Factory function with builder pattern support
  • Version constant for library versioning

Functional Options (options.go)

Implemented clean, composable options pattern:

  • WithExtensions() - Filter by file extensions
  • WithExcludes() - Exclude patterns
  • WithGitIgnore() - Control .gitignore respect
  • WithDefaultRules() - Control built-in filtering
  • WithRelevance() - Keyword-based relevance filtering
  • WithTokenBudget() - AI model token limit enforcement
  • WithFormat() - Output format selection
  • WithVerbose() - Verbose logging
  • WithDebug() - Debug logging with timing

Type System (result.go)

Public types for structured data access:

  • Result - Main result container with formatted output
  • ProjectOutput - Complete project extraction data
  • FileInfo - Individual file metadata and content
  • DirectoryNode - Hierarchical directory structure
  • GitInfo, Metadata, FileStatistics, BudgetInfo, FilterConfig

Format System (format.go)

  • Format type for output formats (PTX, JSONL, Markdown, XML, etc.)
  • Formatter interface for custom formatters
  • RegisterFormatter() for extensibility
  • Result.As() for format conversion

Error Handling (errors.go)

Well-typed sentinel errors:

  • ErrInvalidDirectory - Invalid/inaccessible directory
  • ErrNoFilesMatched - No matching files found
  • ErrTokenBudgetTooLow - Budget too low
  • ErrInvalidFormat - Unsupported format
  • Wrapped errors: DirectoryError, FilterError, FormatError

Documentation (doc.go)

Comprehensive package documentation with:

  • Quick start examples
  • Common usage patterns
  • All available options
  • Design principles
  • Multiple use case examples

Testing & Examples

Unit Tests (promptext_test.go)

Complete test coverage including:

  • Simple extraction scenarios
  • Extension and exclusion filtering
  • Multiple output formats
  • Token budget enforcement
  • Error conditions
  • Extractor reusability
  • Builder pattern
  • Format conversion

All tests pass ✓

Example Programs

Created practical examples demonstrating:

  1. examples/basic/ - Fundamental usage patterns

    • Simple extraction with defaults
    • Extension filtering
    • Exclusion patterns
    • Token budgets
    • Format selection and conversion
    • Saving to files
    • Reusable extractors
    • Builder pattern
  2. examples/token-budget/ - AI-focused extraction

    • Token budget enforcement
    • Relevance filtering by keywords
    • Combining relevance + budget
    • Optimizing for different AI models
    • Token efficiency analysis
  3. examples/README.md - Comprehensive guide

    • Usage patterns
    • Common use cases
    • Available options reference
    • Error handling examples

Documentation Updates

README.md

Added new "Using as a Library" section with:

  • Installation instructions
  • Quick start guide
  • Common patterns
  • Complete options reference
  • Output formats
  • Error handling
  • Links to examples and API docs

Updated Use Cases section to highlight library integration

Design Principles

  1. Simple by Default - Works with zero configuration
  2. Composable - Options combine naturally
  3. Discoverable - IDE autocomplete reveals options
  4. Safe - Typed errors with errors.Is() support
  5. Extensible - Custom formatters via RegisterFormatter()

Backward Compatibility

✓ All existing CLI functionality works unchanged
✓ All existing tests pass
✓ CLI still builds and operates correctly
✓ No breaking changes to internal packages

API Examples

Simple usage:

result, err := promptext.Extract(".")
fmt.Println(result.FormattedOutput)

With options:

result, err := promptext.Extract(".",
    promptext.WithExtensions(".go"),
    promptext.WithTokenBudget(8000),
    promptext.WithFormat(promptext.FormatPTX),
)

Reusable extractor:

extractor := promptext.NewExtractor(
    promptext.WithRelevance("auth"),
    promptext.WithTokenBudget(5000),
)
result1, _ := extractor.Extract("/project1")
result2, _ := extractor.Extract("/project2")

Format conversion:

result, _ := promptext.Extract(".")
markdown, _ := result.As(promptext.FormatMarkdown)
jsonl, _ := result.As(promptext.FormatJSONL)

Next Steps (Phase 2+)

This Phase 1 implementation provides the foundation for:

  • Phase 2: CLI migration to use library (v0.7.1)
  • Phase 3: Advanced features (streaming, analysis) (v0.8.0)
  • Phase 4: Documentation & examples expansion (v0.9.0)
  • Phase 5: v1.0.0 with API stability guarantees

Files Changed

pkg/promptext/
├── promptext.go - Main API and Extract() function
├── options.go - Functional options pattern
├── result.go - Public result types
├── format.go - Format system
├── errors.go - Error types
├── doc.go - Package documentation
└── promptext_test.go - Comprehensive tests

examples/
├── README.md - Examples guide
├── basic/main.go - Basic usage examples
└── token-budget/main.go - AI-focused examples

README.md - Updated with library documentation

Closes # (if applicable)

This commit implements the foundation for using promptext as a Go library,
transforming it from a CLI-only tool into a developer-friendly API while
maintaining 100% backward compatibility with the existing CLI.

## New Public API Package: pkg/promptext

Created a complete public API surface with the following components:

### Core API (promptext.go)
- Extract() - Main entry point for simple extraction
- Extractor type - Reusable extractor for multiple directories
- NewExtractor() - Factory function with builder pattern support
- Version constant for library versioning

### Functional Options (options.go)
Implemented clean, composable options pattern:
- WithExtensions() - Filter by file extensions
- WithExcludes() - Exclude patterns
- WithGitIgnore() - Control .gitignore respect
- WithDefaultRules() - Control built-in filtering
- WithRelevance() - Keyword-based relevance filtering
- WithTokenBudget() - AI model token limit enforcement
- WithFormat() - Output format selection
- WithVerbose() - Verbose logging
- WithDebug() - Debug logging with timing

### Type System (result.go)
Public types for structured data access:
- Result - Main result container with formatted output
- ProjectOutput - Complete project extraction data
- FileInfo - Individual file metadata and content
- DirectoryNode - Hierarchical directory structure
- GitInfo, Metadata, FileStatistics, BudgetInfo, FilterConfig

### Format System (format.go)
- Format type for output formats (PTX, JSONL, Markdown, XML, etc.)
- Formatter interface for custom formatters
- RegisterFormatter() for extensibility
- Result.As() for format conversion

### Error Handling (errors.go)
Well-typed sentinel errors:
- ErrInvalidDirectory - Invalid/inaccessible directory
- ErrNoFilesMatched - No matching files found
- ErrTokenBudgetTooLow - Budget too low
- ErrInvalidFormat - Unsupported format
- Wrapped errors: DirectoryError, FilterError, FormatError

### Documentation (doc.go)
Comprehensive package documentation with:
- Quick start examples
- Common usage patterns
- All available options
- Design principles
- Multiple use case examples

## Testing & Examples

### Unit Tests (promptext_test.go)
Complete test coverage including:
- Simple extraction scenarios
- Extension and exclusion filtering
- Multiple output formats
- Token budget enforcement
- Error conditions
- Extractor reusability
- Builder pattern
- Format conversion

All tests pass ✓

### Example Programs
Created practical examples demonstrating:

1. examples/basic/ - Fundamental usage patterns
   - Simple extraction with defaults
   - Extension filtering
   - Exclusion patterns
   - Token budgets
   - Format selection and conversion
   - Saving to files
   - Reusable extractors
   - Builder pattern

2. examples/token-budget/ - AI-focused extraction
   - Token budget enforcement
   - Relevance filtering by keywords
   - Combining relevance + budget
   - Optimizing for different AI models
   - Token efficiency analysis

3. examples/README.md - Comprehensive guide
   - Usage patterns
   - Common use cases
   - Available options reference
   - Error handling examples

## Documentation Updates

### README.md
Added new "Using as a Library" section with:
- Installation instructions
- Quick start guide
- Common patterns
- Complete options reference
- Output formats
- Error handling
- Links to examples and API docs

Updated Use Cases section to highlight library integration

## Design Principles

1. **Simple by Default** - Works with zero configuration
2. **Composable** - Options combine naturally
3. **Discoverable** - IDE autocomplete reveals options
4. **Safe** - Typed errors with errors.Is() support
5. **Extensible** - Custom formatters via RegisterFormatter()

## Backward Compatibility

✓ All existing CLI functionality works unchanged
✓ All existing tests pass
✓ CLI still builds and operates correctly
✓ No breaking changes to internal packages

## API Examples

Simple usage:
```go
result, err := promptext.Extract(".")
fmt.Println(result.FormattedOutput)
```

With options:
```go
result, err := promptext.Extract(".",
    promptext.WithExtensions(".go"),
    promptext.WithTokenBudget(8000),
    promptext.WithFormat(promptext.FormatPTX),
)
```

Reusable extractor:
```go
extractor := promptext.NewExtractor(
    promptext.WithRelevance("auth"),
    promptext.WithTokenBudget(5000),
)
result1, _ := extractor.Extract("/project1")
result2, _ := extractor.Extract("/project2")
```

Format conversion:
```go
result, _ := promptext.Extract(".")
markdown, _ := result.As(promptext.FormatMarkdown)
jsonl, _ := result.As(promptext.FormatJSONL)
```

## Next Steps (Phase 2+)

This Phase 1 implementation provides the foundation for:
- Phase 2: CLI migration to use library (v0.7.1)
- Phase 3: Advanced features (streaming, analysis) (v0.8.0)
- Phase 4: Documentation & examples expansion (v0.9.0)
- Phase 5: v1.0.0 with API stability guarantees

## Files Changed

pkg/promptext/
├── promptext.go       - Main API and Extract() function
├── options.go         - Functional options pattern
├── result.go          - Public result types
├── format.go          - Format system
├── errors.go          - Error types
├── doc.go            - Package documentation
└── promptext_test.go  - Comprehensive tests

examples/
├── README.md          - Examples guide
├── basic/main.go      - Basic usage examples
└── token-budget/main.go - AI-focused examples

README.md              - Updated with library documentation

Closes #<issue> (if applicable)
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the final PR Bugbot will review for you during this billing cycle

Your free Bugbot reviews will reset on December 24

Details

You are on the Bugbot Free tier. On this plan, Bugbot will review limited PRs each billing cycle.

To receive Bugbot reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.

// It first checks custom formatters, then falls back to built-in formatters.
func GetFormatter(formatStr string) (Formatter, error) {
// Check custom formatters first
if customFormatter, ok := customFormatters[formatStr]; ok {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Data Races in Public Library API

The global customFormatters map is accessed without synchronization in RegisterFormatter and GetFormatter. Concurrent calls to these functions from different goroutines will cause data races. Since this is a public library API, users may call RegisterFormatter from multiple goroutines during initialization, leading to undefined behavior.

Fix in Cursor Fix in Web

// extractor := promptext.NewExtractor().WithFormat(promptext.FormatJSONL)
func (e *Extractor) WithFormat(format Format) *Extractor {
e.config.format = format
return e
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Builder Methods Violate Immutability Contract

The builder methods WithExtensions, WithExcludes, and WithFormat mutate the Extractor in place but the documentation claims they return "a new Extractor". This breaks immutability and causes unexpected behavior when reusing extractors. For instance, if an extractor is created once and builder methods are called on it later, the original configuration is modified, affecting all subsequent uses. The documentation promises a new instance but the implementation modifies the receiver.

Fix in Cursor Fix in Web

@claude
Copy link

claude bot commented Nov 9, 2025

Pull Request Review: Phase 1 - Core Library/API (v0.7.0-alpha)

Overall Assessment

This is an excellent first phase implementation that successfully transforms promptext from a CLI-only tool into a developer-friendly library while maintaining 100% backward compatibility. The code demonstrates strong software engineering practices with a clean API design, comprehensive documentation, and good test coverage.

✅ Strengths

1. API Design Excellence

  • Functional Options Pattern: Clean, composable, and idiomatic Go design
  • Builder Pattern Support: Fluent API via Extractor methods
  • Sensible Defaults: Works out-of-the-box with zero configuration
  • Type Safety: Well-defined error types with proper error wrapping
  • Discoverability: Clear naming and comprehensive doc comments

2. Code Quality

  • Clean separation between public API (pkg/promptext) and internal packages
  • Proper error handling with sentinel errors (ErrInvalidDirectory, ErrNoFilesMatched, etc.)
  • Good use of error wrapping for context preservation
  • Consistent code style and formatting
  • Excellent documentation with usage examples

3. Test Coverage

  • Comprehensive unit tests covering main scenarios
  • Tests for error conditions
  • Tests for option combinations
  • Builder pattern and reusability tests
  • Format conversion tests
  • Tests use t.TempDir() correctly for isolation

4. Documentation

  • Outstanding package documentation (doc.go)
  • Clear examples in README
  • Working example programs with real use cases
  • Inline code examples in godoc comments

Areas for Improvement

🔴 Critical Issues

1. Concurrency Safety Concern in Extractor (pkg/promptext/promptext.go:189-214)

The builder methods (WithExtensions, WithExcludes, WithFormat) mutate the Extractor config in-place and return *Extractor. This creates potential issues:

func (e *Extractor) WithExtensions(extensions ...string) *Extractor {
    e.config.extensions = extensions  // Mutates in-place
    return e
}

Problem: If an Extractor is reused across goroutines or multiple calls, this mutation can cause race conditions.

Example issue:

extractor := promptext.NewExtractor()
// These share the same config!
e1 := extractor.WithExtensions(".go")
e2 := extractor.WithExtensions(".js")  // Overwrites e1's config

Recommendation: Either:

  1. Document that Extractor is not safe for concurrent use and these methods modify state
  2. Make these methods return a new Extractor with cloned config (immutable pattern)
  3. Remove builder methods from Extractor and only allow configuration via NewExtractor(opts...)

I recommend option 3 for simplicity, since users can already use:

extractor := promptext.NewExtractor(
    promptext.WithExtensions(".go"),
    promptext.WithExcludes("vendor/"),
)

2. Global State in Custom Formatters (pkg/promptext/format.go:49)

var customFormatters = make(map[string]Formatter)

Problem: This global map is not protected by a mutex, making RegisterFormatter and GetFormatter unsafe for concurrent access.

Recommendation: Add a sync.RWMutex to protect concurrent access:

var (
    customFormatters = make(map[string]Formatter)
    formattersMu     sync.RWMutex
)

func RegisterFormatter(name string, formatter Formatter) {
    formattersMu.Lock()
    defer formattersMu.Unlock()
    customFormatters[name] = formatter
}

func GetFormatter(formatStr string) (Formatter, error) {
    formattersMu.RLock()
    customFormatter, ok := customFormatters[formatStr]
    formattersMu.RUnlock()
    // ... rest of code
}

🟡 Medium Priority Issues

3. Missing Input Validation (pkg/promptext/options.go:40-56)

The WithExtensions and WithExcludes functions don't validate inputs:

func WithExtensions(extensions ...string) Option {
    return func(c *config) {
        c.extensions = extensions  // No validation
    }
}

Potential issues:

  • Empty strings in extensions/excludes arrays
  • Extensions without leading dots (e.g., "go" instead of ".go")
  • Invalid glob patterns

Recommendation: Add validation:

func WithExtensions(extensions ...string) Option {
    return func(c *config) {
        validated := make([]string, 0, len(extensions))
        for _, ext := range extensions {
            if ext == "" {
                continue  // Skip empty
            }
            if !strings.HasPrefix(ext, ".") {
                ext = "." + ext  // Auto-fix
            }
            validated = append(validated, ext)
        }
        c.extensions = validated
    }
}

4. Token Budget Validation (pkg/promptext/options.go:120-124)

func WithTokenBudget(maxTokens int) Option {
    return func(c *config) {
        c.tokenBudget = maxTokens  // Accepts negative values!
    }
}

Recommendation: Validate the budget is non-negative or document that 0 means unlimited and negative values are invalid.

5. Error Context Could Be Improved (pkg/promptext/promptext.go:152-155)

procResult, err := processor.ProcessDirectory(procConfig, e.config.verbose)
if err != nil {
    return nil, fmt.Errorf("error processing directory: %w", err)
}

Issue: The error message doesn't include the directory path, making debugging harder.

Recommendation:

return nil, fmt.Errorf("error processing directory %s: %w", absPath, err)

6. Potential Directory Traversal Risk (pkg/promptext/promptext.go:216-235)

The resolvePath function uses filepath.Abs on user input without validating the result:

absPath, err := filepath.Abs(dir)
if err != nil {
    return "", fmt.Errorf("failed to resolve absolute path: %w", err)
}
return absPath, nil

Consideration: While filepath.Abs is generally safe, consider if there are any security implications for your use case. If this library will be used in server contexts where users can specify arbitrary paths, you may want to add path validation.

Recommendation: Document the security expectations or add validation if needed:

// Optionally: Ensure the path doesn't escape a specific root
// Or document that callers must validate paths in server contexts

🔵 Low Priority / Polish

7. Unused Helper Functions (pkg/promptext/promptext.go:254-262)

func joinExtensions(extensions []string) string {
    return strings.Join(extensions, ",")
}

func joinExcludes(excludes []string) string {
    return strings.Join(excludes, ",")
}

These functions appear unused in the current code. Consider removing or using them.

8. Test Coverage Gaps

While test coverage is good, consider adding tests for:

  • Concurrent access to Extractor (if supported)
  • Custom formatter registration and usage
  • Edge cases: very long file paths, special characters in paths
  • Relevance filtering functionality
  • Error unwrapping with errors.As() and errors.Is()

9. Documentation Consistency

In doc.go:154-156:

// This is version 0.7.0 (Phase 1) of the library API.
// The API may evolve during the 0.x releases. Version 1.0.0 will provide
// API stability guarantees and backward compatibility.

This is great! Consider adding a COMPATIBILITY.md or section in README about stability guarantees.


Performance Considerations

✅ Good Practices Observed

  1. Efficient Type Conversions: The conversion functions between internal and public types are straightforward
  2. Reusable Extractors: Good pattern for processing multiple directories with same config
  3. Format Conversion: Result.As() allows format conversion without re-processing files

💡 Potential Optimizations

  1. Memory Allocation in Options (pkg/promptext/options.go:96-107)

The WithRelevance function builds a string by concatenation:

keywordStr := ""
for i, kw := range keywords {
    if i > 0 {
        keywordStr += " "
    }
    keywordStr += kw
}

Recommendation: Use strings.Join for better performance:

c.relevanceKeywords = strings.Join(keywords, " ")
  1. Deep Copying for Immutability: If builder methods are made immutable (recommendation Add Claude Code GitHub Workflow #1), be mindful of copying costs. For config structs this should be negligible.

Security Assessment

✅ Security Positives

  1. No SQL/Command Injection: Pure file system operations
  2. Error Handling: Proper validation of directory existence
  3. No Unsafe Operations: No use of unsafe package
  4. Input Sanitization: Paths are resolved with filepath.Abs and filepath.Clean

⚠️ Security Considerations

  1. Path Traversal (mentioned above): Generally safe for local use, but document expectations for server use

  2. Resource Exhaustion: No limits on:

    • Number of files processed (could process millions of files)
    • Directory depth traversal
    • Memory usage from reading large files

    Consider adding configurable limits in future versions if used in multi-tenant environments.

  3. TOCTOU (Time-of-Check-Time-of-Use): Between validateDirectory and actual file reading, files could be modified. This is generally acceptable for this use case.


Testing

✅ Strong Test Suite

  • 15+ test functions covering main scenarios
  • Good use of table-driven tests (format testing)
  • Proper test isolation with t.TempDir()
  • Error condition testing
  • Edge case coverage (empty directories, invalid paths)

📝 Suggested Additional Tests

  1. Concurrency test:
func TestExtractor_Concurrent(t *testing.T) {
    extractor := NewExtractor(WithFormat(FormatPTX))
    var wg sync.WaitGroup
    for i := 0; i < 10; i++ {
        wg.Add(1)
        go func() {
            defer wg.Done()
            _, err := extractor.Extract(tmpDir)
            if err != nil {
                t.Error(err)
            }
        }()
    }
    wg.Wait()
}
  1. Custom formatter test:
func TestRegisterFormatter(t *testing.T) {
    type testFormatter struct{}
    func (f *testFormatter) Format(output *ProjectOutput) (string, error) {
        return "custom", nil
    }
    RegisterFormatter("test", &testFormatter{})
    result, _ := Extract(tmpDir, WithFormat("test"))
    if result.FormattedOutput != "custom" {
        t.Error("Custom formatter not used")
    }
}
  1. Large file/directory stress test

Documentation Review

✅ Excellent Documentation

  1. Package Documentation: Outstanding doc.go with comprehensive examples
  2. README Updates: Clear "Using as a Library" section
  3. Examples Directory: Practical, runnable examples
  4. Inline Comments: Good godoc comments on all exported items

📝 Minor Suggestions

  1. Add a "Common Pitfalls" section to docs about:

    • Extractor reusability and mutation behavior
    • Token budget overhead (not exact)
    • Performance considerations for large codebases
  2. Consider adding API stability badges/notes to README:

    ⚠️ **API Stability**: This is v0.7.0-alpha. The API may change before v1.0.0.
    

Recommendations Summary

Must Fix Before Merge (Critical)

  1. ✅ Fix or document concurrency safety for Extractor builder methods
  2. ✅ Add mutex protection to customFormatters global map

Should Fix Before Merge (High Priority)

  1. Add input validation for extensions, excludes, and token budget
  2. Improve error messages to include directory paths

Consider for Future (Medium Priority)

  1. Add tests for concurrency and custom formatters
  2. Document security expectations for path handling
  3. Add resource limits documentation

Polish (Low Priority)

  1. Remove unused helper functions
  2. Optimize string concatenation in WithRelevance
  3. Add API stability documentation

Conclusion

This is a high-quality implementation that demonstrates solid engineering practices. The API design is clean, intuitive, and follows Go idioms well. The main concerns are around concurrency safety and input validation, which should be addressed before merging.

Recommendation: ✅ Approve with requested changes

The critical issues are straightforward to fix and won't require architectural changes. Once the concurrency and validation issues are addressed, this will be an excellent foundation for the library.

Great work on Phase 1! The design decisions (functional options, builder pattern, extensible formatters) will serve the project well as it grows.


Detailed Change Requests

  1. Add mutex to customFormatters (format.go)
  2. Fix or document Extractor builder method mutation (promptext.go)
  3. Add input validation (options.go)
  4. Improve error context (promptext.go)
  5. Add concurrency and custom formatter tests (promptext_test.go)

Would you like me to provide specific code snippets for any of these fixes?

@1broseidon 1broseidon merged commit d987298 into main Nov 9, 2025
5 checks passed
@1broseidon 1broseidon deleted the claude/phase-1-core-library-api-011CUxhL2yfFq2TjcFeWc2nv branch November 9, 2025 17:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants