Skip to content

Conversation

@prosdev
Copy link
Collaborator

@prosdev prosdev commented Nov 27, 2025

Summary

Implements the GitIndexer for indexing git commits into the vector store, enabling semantic search over commit history.

Changes

GitIndexer (packages/core/src/git/indexer.ts)

Core Functionality:

  • index(options) - Extract and index commits into vector store

    • Configurable commit limit (default: 1000)
    • Date range filtering (since/until)
    • Author filtering
    • Merge commit exclusion
    • Progress reporting callback
    • Batch processing for efficiency
  • search(query, options) - Semantic search over commit messages

    • Returns full GitCommit objects
    • Filters by commit document type
    • Score threshold support
  • getFileHistory(path, options) - Get commits for a specific file

    • Uses git --follow for rename tracking

Document Structure:
Each commit is stored with:

  • Text: Subject + body + file paths (for better semantic matching)
  • Metadata: hash, author, date, stats, issue/PR refs, full commit object

Factory Function

  • createGitIndexer(repoPath, vectorStorage) - Convenience factory

Tests

  • 17 comprehensive unit tests covering:
    • Commit extraction and indexing
    • Filtering options (limit, date, author)
    • Error handling (extraction, storage)
    • Progress reporting
    • Batch processing
    • Search functionality
    • File history retrieval
    • Document structure validation

Design Decisions

  1. Rich Text for Embedding: Include file paths in embedding text so searches like "changes to auth" find commits that touched auth files
  2. Full Commit in Metadata: Store complete GitCommit object in _commit field for retrieval without re-parsing
  3. Type Filtering: Use type: 'commit' metadata to distinguish from code documents
  4. Batched Storage: Process in configurable batches to handle large histories

Closes

Closes #92

Part of

Epic: Intelligent Git History (v0.4.0) #90

- Add GitIndexer for indexing commits into vector store
- Support semantic search over commit messages
- Support file-specific history retrieval
- Batch embedding and storage with progress reporting
- Include file paths in embeddings for better search
- Store full commit metadata including issue/PR refs
- Add createGitIndexer factory function
- Add 17 comprehensive unit tests

Part of Epic: Intelligent Git History (v0.4.0) #90
@prosdev prosdev merged commit 7576454 into main Nov 27, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Commit indexing in core

1 participant