Skip to content

Proposal: Improving chunk quality and metadata grounding in ingestion pipeline #8

@Nitinkr18

Description

@Nitinkr18

Hello maintainers,

I’ve been exploring the KnowledgeSpace ingestion and retrieval pipeline to understand how dataset descriptions and metadata are converted into searchable chunks and embeddings.

Based on this review, I drafted a short proposal focusing on small, incremental improvements to:

  • chunk quality validation,
  • metadata grounding in embeddings, and
  • safer deduplication logic.

The goal is to improve retrieval precision and grounding without changing models, infrastructure, or the overall system architecture.

Before starting implementation, I wanted to check whether this direction aligns with current priorities and whether such changes would be welcome as a contribution.

I’m happy to share a brief proposal document or adjust the scope based on feedback.

Thank you for your time.

Nitin_Krishna_KnowledgeSpace_Prior_Contribution_Proposal.pdf

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions