-
Notifications
You must be signed in to change notification settings - Fork 430
I have got some problems with the embedding part. #32
Description
What is happening here?
INFO: Parsing C:\Users\ProyectosIA\Paper2Slides\sources\uploads\5f2e7c7c-5ee9-420b-ac59-9c650f32d985\Informe CORTE 1.pdf complete! Extracted 30 content blocks
INFO: Stored parsing result in cache: da3d01e45d6871b07205e92a6ae3a9c8
INFO:
Content Information:
INFO: * Total blocks in content_list: 30
INFO: * Content block types:
INFO: - text: 20
INFO: - image: 4
INFO: - discarded: 6
INFO: Content separation complete:
INFO: - Text content length: 7206 characters
INFO: - Multimodal items count: 10
INFO: - Multimodal type distribution: {'image': 4, 'discarded': 6}
INFO: Setting content source for context-aware multimodal processing...
INFO: Content source set with format: minerU
INFO: Content source set with format: minerU
INFO: Content source set with format: minerU
INFO: Content source set with format: minerU
INFO: Content source set for context extraction (format: minerU)
INFO: Starting text content insertion into LightRAG...
WARNING: Duplicate document detected: doc-e18c3ce4c296ad7f6a875dfacd2a1e76 (Informe CORTE 1.pdf)
INFO: Created 1 duplicate document records with track_id: insert_20260316_145544_d8fc6c5f
WARNING: No new unique documents were found.
INFO: Preserving 5 failed document entries for manual review
INFO: Reset 1 documents from PROCESSING/FAILED to PENDING status
INFO: Processing 1 document(s)
INFO: Extracting stage 1/1: Informe CORTE 1.pdf
INFO: Processing d-id: doc-e18c3ce4c296ad7f6a875dfacd2a1e76
INFO: Embedding func: 8 new workers initialized (Timeouts: Func: 30s, Worker: 60s, Health Check: 75s)
ERROR: Embedding func: Error in decorated function for task 2059954273536_242575.765: Vector count mismatch: expected 2 vectors but got 4 vectors (from embedding result).
ERROR: Traceback (most recent call last):
File "C:\Users\ProyectosIA\Paper2Slides\paper2slides\Lib\site-packages\lightrag\lightrag.py", line 1932, in process_document
await asyncio.gather(*first_stage_tasks)
File "C:\Users\ProyectosIA\Paper2Slides\paper2slides\Lib\site-packages\lightrag\kg\nano_vector_db_impl.py", line 124, in upsert
embeddings_list = await asyncio.gather(*embedding_tasks)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\ProyectosIA\Paper2Slides\paper2slides\Lib\site-packages\lightrag\utils.py", line 503, in call
result = await self.func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\ProyectosIA\Paper2Slides\paper2slides\Lib\site-packages\lightrag\utils.py", line 1016, in wait_func
return await future
^^^^^^^^^^^^
File "C:\Users\ProyectosIA\Paper2Slides\paper2slides\Lib\site-packages\lightrag\utils.py", line 720, in worker
result = await asyncio.wait_for(
^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\User\AppData\Local\Programs\Python\Python311\Lib\asyncio\tasks.py", line 479, in wait_for
return fut.result()
^^^^^^^^^^^^
File "C:\Users\ProyectosIA\Paper2Slides\paper2slides\Lib\site-packages\lightrag\utils.py", line 522, in call
raise ValueError(
ValueError: Vector count mismatch: expected 2 vectors but got 4 vectors (from embedding result).
ERROR: Failed to extract document 1/1: Informe CORTE 1.pdf
INFO: Enqueued document processing pipeline stopped
INFO: Text content insertion complete
INFO: Starting multimodal content processing...
INFO: Starting to process 10 multimodal content items
INFO: 127.0.0.1:61322 - "GET /api/status/5f2e7c7c-5ee9-420b-ac59-9c650f32d985 HTTP/1.1" 200 OK
INFO: Multimodal chunk generation progress: 1/10 (10.0%)
INFO: Multimodal chunk generation progress: 2/10 (20.0%)
INFO: Multimodal chunk generation progress: 3/10 (30.0%)
INFO: Multimodal chunk generation progress: 4/10 (40.0%)
INFO: Multimodal chunk generation progress: 5/10 (50.0%)
INFO: Multimodal chunk generation progress: 6/10 (60.0%)
INFO: Multimodal chunk generation progress: 7/10 (70.0%)
INFO: Multimodal chunk generation progress: 8/10 (80.0%)
INFO: Multimodal chunk generation progress: 9/10 (90.0%)
INFO: Multimodal chunk generation progress: 10/10 (100.0%)
INFO: Generated descriptions for 10/10 multimodal items using correct processors
ERROR: Embedding func: Error in decorated function for task 2059959360320_242609.39: Vector count mismatch: expected 10 vectors but got 20 vectors (from embedding result).
ERROR: Error storing chunks to storage: Vector count mismatch: expected 10 vectors but got 20 vectors (from embedding result).
ERROR: Error in multimodal processing: Vector count mismatch: expected 10 vectors but got 20 vectors (from embedding result).
WARNING: Falling back to individual multimodal processing
INFO: Processing item 1/10: image content
ERROR: Embedding func: Error in decorated function for task 2059959360896_242617.562: Vector count mismatch: expected 1 vectors but got 2 vectors (from embedding result).
ERROR: Error processing image content: Vector count mismatch: expected 1 vectors but got 2 vectors (from embedding result).
ERROR: Error processing multimodal content: not enough values to unpack (expected 3, got 2)
INFO: Processing item 2/10: discarded content