Skip to content

[Feature Request] Add a structured RAG failure mode checklist (WFGY 16-problem map) for OpenSearch based pipelinesΒ #20750

@onestardao

Description

@onestardao

Is your feature request related to a problem? Please describe

OpenSearch is increasingly used as the vector backend for RAG (Retrieval-Augmented Generation) systems.

When a RAG system produces incorrect, stale, or hallucinated answers, many teams are unsure whether the root cause lies in:

  • OpenSearch indexing configuration
  • Embedding mismatch or normalization
  • Retrieval strategy (k, filters, hybrid search)
  • Application layer logic
  • Prompt orchestration

As a result, OpenSearch is sometimes blamed for issues that are actually pipeline-level configuration problems.

There is currently no structured, cross-stack diagnostic checklist in the OpenSearch documentation that helps users systematically distinguish:

  1. Vector/index configuration problems
  2. Retrieval quality problems
  3. Application or prompt logic errors
  4. Evaluation and logging blind spots

A standardized failure taxonomy would reduce confusion and improve debugging efficiency.

Describe the solution you'd like

I would like to propose adding a structured RAG failure mode checklist based on the open-source WFGY RAG 16 Problem Map (MIT licensed):

https://github.com/onestardao/WFGY
https://github.com/onestardao/WFGY/tree/main/ProblemMap/README.md

Proposal:

  1. Add a documentation page such as:
    "Debugging RAG systems built on OpenSearch"

    The page would:

    • Map common RAG failures to 16 well-defined failure categories
    • Clearly distinguish OpenSearch-level misconfiguration from pipeline-level design errors
    • Provide a structured triage flow for diagnosing issues
  2. Provide a minimal example notebook that:

    • Builds a simple OpenSearch-based RAG pipeline
    • Intentionally introduces common failure patterns (dimension mismatch, poor chunking, wrong similarity metric, improper filters, etc.)
    • Shows how each failure maps to a specific category
    • Demonstrates corrective configuration changes

This would not introduce new runtime dependencies.
It would be documentation-level guidance to help users debug their systems more systematically.

Related component

Extensions

Describe alternatives you've considered

Currently, most debugging approaches rely on ad-hoc experimentation:

  • Changing k values
  • Re-indexing
  • Switching embedding models
  • Tweaking similarity metrics
  • Adding logs without a structured classification model

While these can work, they lack a shared vocabulary of failure types.

Other frameworks (e.g., LlamaIndex, RAGFlow, academic surveys) have started referencing structured failure taxonomies.

However, OpenSearch documentation does not yet include a consolidated, vendor-neutral RAG failure checklist.

Additional context

The WFGY 16 Problem Map has already been referenced or integrated in:

  • RAGFlow troubleshooting documentation
  • LlamaIndex RAG diagnostics sections
  • ToolUniverse (Harvard MIMS Lab)
  • Rankify (University of Innsbruck)
  • Multimodal RAG Survey (QCRI LLM Lab)
  • Curated lists such as Awesome LLM Apps

The goal is not to introduce a new framework, but to give OpenSearch users a structured debugging vocabulary to reduce misattribution of RAG failures.

If this direction is of interest, I would be happy to prepare a draft documentation PR aligned with OpenSearch style guidelines.

Metadata

Metadata

Assignees

No one assigned

    Labels

    documentationImprovements or additions to documentationenhancementEnhancement or improvement to existing feature or requestextensions

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions