-
Notifications
You must be signed in to change notification settings - Fork 2.5k
Description
Is your feature request related to a problem? Please describe
OpenSearch is increasingly used as the vector backend for RAG (Retrieval-Augmented Generation) systems.
When a RAG system produces incorrect, stale, or hallucinated answers, many teams are unsure whether the root cause lies in:
- OpenSearch indexing configuration
- Embedding mismatch or normalization
- Retrieval strategy (k, filters, hybrid search)
- Application layer logic
- Prompt orchestration
As a result, OpenSearch is sometimes blamed for issues that are actually pipeline-level configuration problems.
There is currently no structured, cross-stack diagnostic checklist in the OpenSearch documentation that helps users systematically distinguish:
- Vector/index configuration problems
- Retrieval quality problems
- Application or prompt logic errors
- Evaluation and logging blind spots
A standardized failure taxonomy would reduce confusion and improve debugging efficiency.
Describe the solution you'd like
I would like to propose adding a structured RAG failure mode checklist based on the open-source WFGY RAG 16 Problem Map (MIT licensed):
https://github.com/onestardao/WFGY
https://github.com/onestardao/WFGY/tree/main/ProblemMap/README.md
Proposal:
-
Add a documentation page such as:
"Debugging RAG systems built on OpenSearch"The page would:
- Map common RAG failures to 16 well-defined failure categories
- Clearly distinguish OpenSearch-level misconfiguration from pipeline-level design errors
- Provide a structured triage flow for diagnosing issues
-
Provide a minimal example notebook that:
- Builds a simple OpenSearch-based RAG pipeline
- Intentionally introduces common failure patterns (dimension mismatch, poor chunking, wrong similarity metric, improper filters, etc.)
- Shows how each failure maps to a specific category
- Demonstrates corrective configuration changes
This would not introduce new runtime dependencies.
It would be documentation-level guidance to help users debug their systems more systematically.
Related component
Extensions
Describe alternatives you've considered
Currently, most debugging approaches rely on ad-hoc experimentation:
- Changing k values
- Re-indexing
- Switching embedding models
- Tweaking similarity metrics
- Adding logs without a structured classification model
While these can work, they lack a shared vocabulary of failure types.
Other frameworks (e.g., LlamaIndex, RAGFlow, academic surveys) have started referencing structured failure taxonomies.
However, OpenSearch documentation does not yet include a consolidated, vendor-neutral RAG failure checklist.
Additional context
The WFGY 16 Problem Map has already been referenced or integrated in:
- RAGFlow troubleshooting documentation
- LlamaIndex RAG diagnostics sections
- ToolUniverse (Harvard MIMS Lab)
- Rankify (University of Innsbruck)
- Multimodal RAG Survey (QCRI LLM Lab)
- Curated lists such as Awesome LLM Apps
The goal is not to introduce a new framework, but to give OpenSearch users a structured debugging vocabulary to reduce misattribution of RAG failures.
If this direction is of interest, I would be happy to prepare a draft documentation PR aligned with OpenSearch style guidelines.