Feature request: RAG / LLM pipeline debugging tutorial using 16-problem ProblemMap

## Description

Add an official tutorial or guide that shows how to debug RAG / LLM pipelines built with Kedro using a structured 16-problem failure map (WFGY ProblemMap). The guide would help users locate whether a failing RAG system is due to chunking, embeddings, vector stores, retrieval, routing or post-processing, instead of only tuning the LLM prompt.

The change is documentation-only and does not require modifications to Kedro core.

## Context

Kedro is increasingly used as the structural backbone for AI and RAG projects:

* nodes for ingestion, cleaning and chunking,
* pipelines for embeddings and vector-store updates,
* pipelines for retrieval + LLM calls + evaluation.

When something goes wrong, users often have a working pipeline from Kedro’s point of view (no failing nodes), but the RAG behaviour is poor: hallucinations, missing context, unstable answers between runs.

Right now, there is no single Kedro guide that:

* names the typical failure modes of a RAG pipeline end-to-end, and
* explains where in a Kedro pipeline to add logging / tests / diagnostic nodes for each failure mode.

I maintain an MIT-licensed project called **WFGY** (~1.5k GitHub stars). One of its components is the **WFGY 16-problem ProblemMap**, which categorises common RAG / LLM pipeline failures (retriever behaviour, chunking, vector stores, routing, hallucinations, evaluation, etc.) and is already referenced by several curated lists and research projects. I would like to adapt this map specifically for Kedro.

## Possible Implementation

* A new guide under the documentation section that covers “Debugging RAG / LLM pipelines with Kedro”.
* A simple example project with a RAG pipeline, for example:

  `load_raw_docs → clean_text → chunk_docs → embed → write_to_vector_store → retrieve → call_llm → postprocess → evaluate`

* A table that maps each of the 16 failure modes to:
  * which Kedro nodes / datasets are relevant,
  * what to log or visualise (e.g. chunk statistics, retrieval coverage, distribution of similarity scores),
  * small experiments users can run (change chunking, retriever settings, evaluation dataset).

I am happy to open a PR that:
* adds the tutorial page,
* wires it into the docs navigation, and
* includes a minimal example project if that is helpful.

## Possible Alternatives

* Keep this entirely as a community blog post or a separate example repo.  
  This would work, but an official guide in Kedro docs would give new users a much clearer starting point and would standardise the vocabulary for RAG failure modes across the ecosystem.

* Wait for more first-party RAG tooling in Kedro before adding such a guide.  
  Even in today’s state, Kedro already orchestrates many RAG-like pipelines; the proposal is to document *how to debug them* using patterns that many users are already re-discovering ad-hoc.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature request: RAG / LLM pipeline debugging tutorial using 16-problem ProblemMap #5396

Description

Context

Possible Implementation

Possible Alternatives

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Feature request: RAG / LLM pipeline debugging tutorial using 16-problem ProblemMap #5396

Description

Description

Context

Possible Implementation

Possible Alternatives

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions