feat: Add GitHub Issues ingestion pipeline#8
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
Signed-off-by: haroon0x <haroonbmc0@gmail.com>
30afad3 to
5d0b5ff
Compare
|
This is a great addition — ingesting GitHub Issues is a big step toward the Agentic RAG vision. A few follow-up suggestions:
Follow-up idea I’d like to implement: ingesting issue comments (workarounds are often in comments more than issue body). Happy to contribute a PR. |
Signed-off-by: haroon0x <haroonbmc0@gmail.com>
Thanks for pointing these out. I have implemented these in the last commit. |
Adds a new KFP component and pipeline to ingest GitHub Issues from multiple Kubeflow repositories into the RAG system.
Fixes #7 , #9
Changes
New Component:
download_github_issuesA component that:
kubeflow/kubeflow,kubeflow/pipelines,kubeflow/kserve)kind/bug,kind/question) and state (open,closed,all)download_github_directoryfor compatibility with existingchunk_and_embedcomponentMotivation
Currently, the pipeline only indexes documentation from
kubeflow/website. This limits the agent's ability to help with troubleshooting. By indexing GitHub Issues, the agent can answer:This aligns with the Agentic RAG proposal which mentions indexing Documentation, GitHub Issues, and Platform Architecture.
Usage
The component can be connected to the existing pipeline or used in a new pipeline:
Testing
Tested locally without Kubeflow cluster:
kubeflow/kubeflowandkubeflow/pipelineschunk_and_embedinput schemaChecklist