Skip to content

feat: Add GitHub Issues ingestion pipeline#8

Open
haroon0x wants to merge 2 commits intokubeflow:mainfrom
haroon0x:feat/issues-ingestion
Open

feat: Add GitHub Issues ingestion pipeline#8
haroon0x wants to merge 2 commits intokubeflow:mainfrom
haroon0x:feat/issues-ingestion

Conversation

@haroon0x
Copy link

@haroon0x haroon0x commented Jan 29, 2026

Adds a new KFP component and pipeline to ingest GitHub Issues from multiple Kubeflow repositories into the RAG system.
Fixes #7 , #9

Changes

New Component: download_github_issues

A component that:

  • Fetches issues from multiple repos via GitHub REST API
  • Accepts comma-separated list of repos (e.g., kubeflow/kubeflow,kubeflow/pipelines,kubeflow/kserve)
  • Filters by labels (kind/bug, kind/question) and state (open, closed, all)
  • Skips pull requests (they appear in the issues API)
  • Outputs in same JSONL format as download_github_directory for compatibility with existing chunk_and_embed component

Motivation

Currently, the pipeline only indexes documentation from kubeflow/website. This limits the agent's ability to help with troubleshooting. By indexing GitHub Issues, the agent can answer:

  • "Is there a known bug for error X?"
  • "What's the workaround for issue Y?"

This aligns with the Agentic RAG proposal which mentions indexing Documentation, GitHub Issues, and Platform Architecture.

Usage

The component can be connected to the existing pipeline or used in a new pipeline:

download_task = download_github_issues(
    repos="kubeflow/kubeflow,kubeflow/pipelines",
    labels="kind/bug,kind/question",
    state="all",
    max_issues_per_repo=200,
    github_token=github_token
)

Testing

Tested locally without Kubeflow cluster:

  • Fetched issues from kubeflow/kubeflow and kubeflow/pipelines
  • Verified output format matches existing chunk_and_embed input schema
  • Confirmed multi-repo support works

Checklist

  • New component follows existing patterns
  • Reuses existing components where possible
  • Syntax validated
  • Local testing passed
  • DCO signed

@google-oss-prow
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign franciscojavierarceo for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Signed-off-by: haroon0x <haroonbmc0@gmail.com>
@haroon0x haroon0x force-pushed the feat/issues-ingestion branch from 30afad3 to 5d0b5ff Compare January 29, 2026 18:05
@google-oss-prow google-oss-prow bot added size/M and removed size/L labels Jan 29, 2026
@Sayan4496
Copy link

This is a great addition — ingesting GitHub Issues is a big step toward the Agentic RAG vision.

A few follow-up suggestions:

  1. Add issue URL (html_url) + timestamps (created_at, updated_at) in output for better citations.
  2. GitHub API filter labels=a,b uses AND; if OR is intended, separate queries may be needed.
  3. Handle GitHub rate limits (X-RateLimit-Remaining, Retry-After) for robustness.

Follow-up idea I’d like to implement: ingesting issue comments (workarounds are often in comments more than issue body). Happy to contribute a PR.

Signed-off-by: haroon0x <haroonbmc0@gmail.com>
@google-oss-prow google-oss-prow bot added size/L and removed size/M labels Jan 30, 2026
@haroon0x
Copy link
Author

  1. Add issue URL (html_url) + timestamps (created_at, updated_at) in output for better citations.
  2. GitHub API filter labels=a,b uses AND; if OR is intended, separate queries may be needed.
  3. Handle GitHub rate limits (X-RateLimit-Remaining, Retry-After) for robustness.

Thanks for pointing these out. I have implemented these in the last commit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feat: Add GitHub Issues ingestion to RAG pipeline

2 participants