Skip to content

Feature request: RAG / LLM pipeline debugging tutorial using 16-problem ProblemMap #5396

@onestardao

Description

@onestardao

Description

Add an official tutorial or guide that shows how to debug RAG / LLM pipelines built with Kedro using a structured 16-problem failure map (WFGY ProblemMap). The guide would help users locate whether a failing RAG system is due to chunking, embeddings, vector stores, retrieval, routing or post-processing, instead of only tuning the LLM prompt.

The change is documentation-only and does not require modifications to Kedro core.

Context

Kedro is increasingly used as the structural backbone for AI and RAG projects:

  • nodes for ingestion, cleaning and chunking,
  • pipelines for embeddings and vector-store updates,
  • pipelines for retrieval + LLM calls + evaluation.

When something goes wrong, users often have a working pipeline from Kedro’s point of view (no failing nodes), but the RAG behaviour is poor: hallucinations, missing context, unstable answers between runs.

Right now, there is no single Kedro guide that:

  • names the typical failure modes of a RAG pipeline end-to-end, and
  • explains where in a Kedro pipeline to add logging / tests / diagnostic nodes for each failure mode.

I maintain an MIT-licensed project called WFGY (~1.5k GitHub stars). One of its components is the WFGY 16-problem ProblemMap, which categorises common RAG / LLM pipeline failures (retriever behaviour, chunking, vector stores, routing, hallucinations, evaluation, etc.) and is already referenced by several curated lists and research projects. I would like to adapt this map specifically for Kedro.

Possible Implementation

  • A new guide under the documentation section that covers “Debugging RAG / LLM pipelines with Kedro”.

  • A simple example project with a RAG pipeline, for example:

    load_raw_docs → clean_text → chunk_docs → embed → write_to_vector_store → retrieve → call_llm → postprocess → evaluate

  • A table that maps each of the 16 failure modes to:

    • which Kedro nodes / datasets are relevant,
    • what to log or visualise (e.g. chunk statistics, retrieval coverage, distribution of similarity scores),
    • small experiments users can run (change chunking, retriever settings, evaluation dataset).

I am happy to open a PR that:

  • adds the tutorial page,
  • wires it into the docs navigation, and
  • includes a minimal example project if that is helpful.

Possible Alternatives

  • Keep this entirely as a community blog post or a separate example repo.
    This would work, but an official guide in Kedro docs would give new users a much clearer starting point and would standardise the vocabulary for RAG failure modes across the ecosystem.

  • Wait for more first-party RAG tooling in Kedro before adding such a guide.
    Even in today’s state, Kedro already orchestrates many RAG-like pipelines; the proposal is to document how to debug them using patterns that many users are already re-discovering ad-hoc.

Metadata

Metadata

Assignees

No one assigned

    Labels

    CommunityIssue/PR opened by the open-source communityIssue: Feature RequestNew feature or improvement to existing feature

    Type

    No type

    Projects

    Status

    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions