Skip to content

Enhancement of GitHubRepoReader #20471

@DineshThumma9

Description

@DineshThumma9

Feature Description

GitHubRepoReader currently reads the entire repository for a given branch. While it supports excluding files or file extensions, it does not allow selectively fetching specific files based on file paths, filenames, or commit SHAs.

This Enhancement proposes:

Adding support to fetch only specific files (by path, name, or SHA) instead of scanning the whole repository.

Introducing an optional mechanism (e.g., a docstore or cache check) to determine whether a file has already been read before fetching it again.

This would reduce unnecessary reads, avoid redundant processing, and significantly improve performance for large repositories or incremental updates.

Reason

No response

Value of Feature

This enhancement enables more fine-grained and efficient integration with GitHubRepoReader.
By avoiding full repository scans and redundant file reads, it improves performance, reduces API usage, and makes the reader far more suitable for real-world workflows like incremental indexing, selective updates, and large mono-repos.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestgood first issueGood for newcomerstriageIssue needs to be triaged/prioritized

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions