-
Notifications
You must be signed in to change notification settings - Fork 2.5k
Open
Labels
bugSomething isn't workingSomething isn't working
Description
🐛 Bug Description
RAG consumes memory exponentially when working with xlsx documents, which leads to a crash
🔄 Steps to Reproduce
- Go to 'Create Index'
- Click on 'Upload Files'
- Scroll down to anything Excel file and click
- RAG server crashes down after some time
✅ Expected Behavior
GPU inference works as expected, as the server handles the error.
❌ Actual Behavior
However, with CPU inference, memory usage is uncontrolled, causing an OOM to simply kill the process.
📸 Screenshots
🖥️ Environment Information
Desktop/Server:
- OS: Ubuntu 24.04
- Python Version: 3.12
- Node.js Version: 24.11
- Ollama Version: 0.15.6
📋 System Health Check
~/localGPT.bak localgpt-v2 !140 ?1 python3 system_health_check.py ok genv py 03:53:57
RAG System Health Check
==================================================
Testing basic imports...
✅ Basic imports successful
Checking configurations...
External Models: {'embedding_model': 'Qwen/Qwen3-Embedding-0.6B', 'reranker_model': 'answerdotai/answerai-colbert-small-v1', 'vision_model': 'Qwen/Qwen-VL-Chat', 'fallback_reranker': 'BAAI/bge-reranker-base'}
Ollama Config: {'host': 'http://localhost:11434', 'generation_model': 'qwen3:8b', 'enrichment_model': 'qwen3:0.6b'}
Pipeline Configs: {'default': {'description': 'Production-ready pipeline with hybrid search, AI reranking, and verification', 'storage': {'lancedb_uri': './lancedb', 'text_table_name': 'text_pages_v3', 'image_table_name': 'image_pages_v3', 'bm25_path': './index_store/bm25', 'graph_path': './index_store/graph/knowledge_graph.gml'}, 'retrieval': {'retriever': 'multivector', 'search_type': 'hybrid', 'late_chunking': {'enabled': True, 'table_suffix': '_lc_v3'}, 'dense': {'enabled': True, 'weight': 0.7}, 'bm25': {'enabled': True, 'index_name': 'rag_bm25_index'}, 'graph': {'enabled': False, 'graph_path': './index_store/graph/knowledge_graph.gml'}}, 'embedding_model_name': 'Qwen/Qwen3-Embedding-0.6B', 'vision_model_name': 'Qwen/Qwen-VL-Chat', 'reranker': {'enabled': True, 'type': 'ai', 'strategy': 'rerankers-lib', 'model_name': 'answerdotai/answerai-colbert-small-v1', 'top_k': 10}, 'query_decomposition': {'enabled': True, 'max_sub_queries': 3, 'compose_from_sub_answers': True}, 'verification': {'enabled': True}, 'retrieval_k': 20, 'context_window_size': 0, 'semantic_cache_threshold': 0.98, 'cache_scope': 'global', 'contextual_enricher': {'enabled': True, 'window_size': 1}, 'indexing': {'embedding_batch_size': 50, 'enrichment_batch_size': 10, 'enable_progress_tracking': True}}, 'fast': {'description': 'Speed-optimized pipeline with minimal overhead', 'storage': {'lancedb_uri': './lancedb', 'text_table_name': 'text_pages_v3', 'image_table_name': 'image_pages_v3', 'bm25_path': './index_store/bm25'}, 'retrieval': {'retriever': 'multivector', 'search_type': 'vector_only', 'late_chunking': {'enabled': False}, 'dense': {'enabled': True}}, 'embedding_model_name': 'Qwen/Qwen3-Embedding-0.6B', 'reranker': {'enabled': False}, 'query_decomposition': {'enabled': False}, 'verification': {'enabled': False}, 'retrieval_k': 10, 'context_window_size': 0, 'contextual_enricher': {'enabled': False, 'window_size': 1}, 'indexing': {'embedding_batch_size': 100, 'enrichment_batch_size': 50, 'enable_progress_tracking': False}}, 'bm25': {'enabled': True, 'index_name': 'rag_bm25_index'}, 'graph_rag': {'enabled': False}}
Embedding model: Qwen/Qwen3-Embedding-0.6B (1024 dims) - Check data compatibility!
✅ Configuration check completed
Testing database access...
/home/user/localGPT.bak/system_health_check.py:99: DeprecationWarning: table_names() is deprecated, use list_tables() instead
tables = db.table_names()
✅ LanceDB connected - 6 tables available
Available tables:
- text_pages_1da5082e-7f64-4814-8c00-06e0606e6002
- text_pages_1da5082e-7f64-4814-8c00-06e0606e6002_lc
- text_pages_ce810632-44ff-4e68-b503-05a32d309d2f
- text_pages_ce810632-44ff-4e68-b503-05a32d309d2f_lc
- text_pages_e2041646-e2e8-4125-bce6-65c2fb402336
... and 1 more
Testing agent initialization...
Initialized Verifier with Ollama model 'qwen3:8b'.
Agent initialized (GraphRAG disabled).
✅ Agent initialization successful
Testing embedding model...
Initializing HF Embedder with model 'Qwen/Qwen3-Embedding-0.6B' on device 'cpu'. (first load)
QwenEmbedder weights loaded and cached for Qwen/Qwen3-Embedding-0.6B.
Generating 1 embeddings with Qwen/Qwen3-Embedding-0.6B model...
✅ Embedding model: Qwen/Qwen3-Embedding-0.6B
✅ Vector dimension: 1024
Using 1024-dim embeddings (Qwen3 compatible) - Ensure data compatibility!
Testing sample query...
/home/user/localGPT.bak/system_health_check.py:122: DeprecationWarning: table_names() is deprecated, use list_tables() instead
tables = db.table_names()
Testing query on table: text_pages_1da5082e-7f64-4814-8c00-06e0606e6002
ROUTING DEBUG: Starting triage for query: 'what is this document about?...'
ROUTING DEBUG: Attempting overview-based routing...
ROUTING DEBUG: No document overviews available, returning None
❌ ROUTING DEBUG: Overview routing returned None, falling back to LLM triage
烙 ROUTING DEBUG: No history, using LLM fallback triage...
烙 ROUTING DEBUG: LLM fallback triage decided: 'rag_query'
ROUTING DEBUG: Final triage decision: 'rag_query'
Agent Triage Decision: 'rag_query'
Generating 1 embeddings with Qwen/Qwen3-Embedding-0.6B model...
✅ ROUTING DEBUG: Executing RAG_QUERY path (query_type='rag_query')
--- Query Decomposition Enabled ---
Query Decomposition Reasoning: Single information need; no context to resolve.
Original query: 'what is this document about?' (Contextual: 'what is this document about?')
Decomposed into 1 sub-queries: ['what is this document about?']
--- Only one sub-query after decomposition; using direct retrieval path ---
LanceDB connection established at: ./lancedb
--- Performing Retrieval for query: 'what is this document about?' on table 'text_pages_1da5082e-7f64-4814-8c00-06e0606e6002' ---
Generating 1 embeddings with Qwen/Qwen3-Embedding-0.6B model...
2026-02-19 03:54:27,346 | INFO | rag-system | Top 19 results:
2026-02-19 03:54:27,346 | INFO | rag-system | chunk_id score preview
2026-02-19 03:54:27,346 | INFO | rag-system | ------------------------------
2026-02-19 03:54:27,347 | INFO | rag-system | 6fbdff51-813 0.000 Context: The image shows an OCR-processor workflow,…
2026-02-19 03:54:27,347 | INFO | rag-system | 6fbdff51-813 0.000 Context: The context summary is: The image shows a bat…
2026-02-19 03:54:27,347 | INFO | rag-system | 6fbdff51-813 0.000 Context: The specific chunk discusses PyTorch documentation…
2026-02-19 03:54:27,347 | INFO | rag-system | 6fbdff51-813 0.000 Context: The OCR module's result shows successful document…
2026-02-19 03:54:27,347 | INFO | rag-system | 6fbdff51-813 0.000 Context: The image shows a document being processed using…
2026-02-19 03:54:27,347 | INFO | rag-system | 6fbdff51-813 0.000 Context: The local context highlights the practice section…
2026-02-19 03:54:27,347 | INFO | rag-system | 6fbdff51-813 0.000 Context: The context discusses how Tesseract, a model for…
2026-02-19 03:54:27,348 | INFO | rag-system | 6fbdff51-813 0.000 Context: The specific chunk highlights how the author…
2026-02-19 03:54:27,348 | INFO | rag-system | 6fbdff51-813 12881.766 Context: The image depicts a CI/CD workflow, highlighting…
2026-02-19 03:54:27,348 | INFO | rag-system | 6fbdff51-813 13214.093 Context: Рекомендуется переобучить модель для адаптации к…
2026-02-19 03:54:27,348 | INFO | rag-system | 6fbdff51-813 14085.947 Context: The context summarizes Jenkins CI/CD code…
2026-02-19 03:54:27,348 | INFO | rag-system | 6fbdff51-813 14226.038 Context: This chunk describes how JSON data is processed…
2026-02-19 03:54:27,348 | INFO | rag-system | 6fbdff51-813 15197.564 Context: The image shows example code for inference model…
2026-02-19 03:54:27,349 | INFO | rag-system | 6fbdff51-813 15289.271 Context: Создание модели для корректировки внутренних…
2026-02-19 03:54:27,349 | INFO | rag-system | 6fbdff51-813 16512.316 Context: The "Организационный" section outlines the…
2026-02-19 03:54:27,349 | INFO | rag-system | 6fbdff51-813 19344.285 Context: This chunk summarizes the CI/CD workflow using…
2026-02-19 03:54:27,349 | INFO | rag-system | 6fbdff51-813 24795.383 Context: The chunk discusses technologies like Jupyter Hub…
2026-02-19 03:54:27,350 | INFO | rag-system | 6fbdff51-813 26259.051 Context: Разработка OCR-системы с использованием моделей…
2026-02-19 03:54:27,350 | INFO | rag-system | 6fbdff51-813 28496.629 Context: The chunk discusses how WMIC and PsExec were used…
Retrieved 19 documents.
Initialising Answer.AI ColBERT reranker (answerdotai/answerai-colbert-small-v1) via rerankers lib…
Loading ColBERTRanker model answerdotai/answerai-colbert-small-v1 (this message can be suppressed by setting verbose=0)
No device set
Using device cuda
No dtype set
Using dtype torch.float32
Loading model answerdotai/answerai-colbert-small-v1, this might take a while...
Linear Dim set to: 96 for downcasting
✅ AI reranker initialized successfully.
--- Reranking top 19 docs with AI model... ---
✅ Reranking completed in 0.23s. Refined to 10 docs.
==========================RAG Context================================
2026-02-19 03:54:35,112 | INFO | httpx | HTTP Request: POST http://localhost:11434/api/generate "HTTP/1.1 200 OK"
Total query processing time: 15.26s
✅ Sample query successful
Answer preview: Answer:
The document describes a university project focused on **developing an OCR (Optical Characte...
Found 10 source documents
==================================================
Health Check Complete: 6/6 checks passed
✅ System is healthy!
📝 Error Logs
Please include relevant error messages or logs:
No relevant logs, cause proccess was killed by system
🔧 Configuration
- Deployment method: [Docker / Direct Python]
- Models used: qwen3:0.6b
- Document types: [Excel]
🤔 Possible Solution
Raise exception if user try upload excel document
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working