-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Description
Do you need to file a feature request?
- I have searched the existing feature request and this feature request is not already filed.
- I believe this is a legitimate feature request, not just a question or bug.
Feature Request Description
Hi RAG-Anything team, thanks for releasing this all-in-one multimodal document RAG system. It is very helpful for real world use cases that mix text, tables and images.
I have been working on RAG reliability and I often see the same kinds of problems in multimodal pipelines, for example:
- OCR or layout parsing errors that silently corrupt the corpus
- table structure lost during preprocessing
- misalignment between image regions and textual descriptions
- retrieval that focuses only on text while the answer actually depends on images or tables
- evaluation that only looks at text and ignores multimodal context and grounding
I would like to propose a very small, documentation only feature:
Feature
Add a short doc page called something like multimodal_rag_failure_modes.md and link it from the README or a Troubleshooting / Best Practices section.
The page would contain:
- A list of common multimodal failure modes (OCR/layout issues, table loss, image text misalignment, biased retrieval toward one modality).
- Simple sanity checks for each one, for example:
- visually inspect a few processed documents
- check index size and embeddings per modality
- run a few probe queries that should clearly depend on images or tables.
- A short checklist for bug reports:
- which modality failed
- what preprocessing pipeline was used
- a minimal example document and query
Why it helps
- Many users will run RAG-Anything on noisy, real world PDFs and mixed documents.
- A shared failure mode checklist can save time when debugging and reduce repeated “it does not work” questions.
- This is documentation only, so it is low risk and easy to review or adjust.
If you think this is useful and in scope, I am happy to draft a PR with a concise version that follows your documentation style.
Additional Context
Additional context:
I recently contributed a small robustness related entry to Harvard MIMS Lab’s ToolUniverse project, so I am used to keeping these kinds of checklists very focused and easy to maintain. Happy to do the same here if you think it fits the roadmap.