-
Notifications
You must be signed in to change notification settings - Fork 6.8k
Description
Feature Description
GitHubRepoReader currently reads the entire repository for a given branch. While it supports excluding files or file extensions, it does not allow selectively fetching specific files based on file paths, filenames, or commit SHAs.
This Enhancement proposes:
Adding support to fetch only specific files (by path, name, or SHA) instead of scanning the whole repository.
Introducing an optional mechanism (e.g., a docstore or cache check) to determine whether a file has already been read before fetching it again.
This would reduce unnecessary reads, avoid redundant processing, and significantly improve performance for large repositories or incremental updates.
Reason
No response
Value of Feature
This enhancement enables more fine-grained and efficient integration with GitHubRepoReader.
By avoiding full repository scans and redundant file reads, it improves performance, reduces API usage, and makes the reader far more suitable for real-world workflows like incremental indexing, selective updates, and large mono-repos.