Skip to content

I have got some problems with the embedding part. #32

@JCaratt

Description

@JCaratt

What is happening here?

INFO: Parsing C:\Users\ProyectosIA\Paper2Slides\sources\uploads\5f2e7c7c-5ee9-420b-ac59-9c650f32d985\Informe CORTE 1.pdf complete! Extracted 30 content blocks
INFO: Stored parsing result in cache: da3d01e45d6871b07205e92a6ae3a9c8
INFO:
Content Information:
INFO: * Total blocks in content_list: 30
INFO: * Content block types:
INFO: - text: 20
INFO: - image: 4
INFO: - discarded: 6
INFO: Content separation complete:
INFO: - Text content length: 7206 characters
INFO: - Multimodal items count: 10
INFO: - Multimodal type distribution: {'image': 4, 'discarded': 6}
INFO: Setting content source for context-aware multimodal processing...
INFO: Content source set with format: minerU
INFO: Content source set with format: minerU
INFO: Content source set with format: minerU
INFO: Content source set with format: minerU
INFO: Content source set for context extraction (format: minerU)
INFO: Starting text content insertion into LightRAG...
WARNING: Duplicate document detected: doc-e18c3ce4c296ad7f6a875dfacd2a1e76 (Informe CORTE 1.pdf)
INFO: Created 1 duplicate document records with track_id: insert_20260316_145544_d8fc6c5f
WARNING: No new unique documents were found.
INFO: Preserving 5 failed document entries for manual review
INFO: Reset 1 documents from PROCESSING/FAILED to PENDING status
INFO: Processing 1 document(s)
INFO: Extracting stage 1/1: Informe CORTE 1.pdf
INFO: Processing d-id: doc-e18c3ce4c296ad7f6a875dfacd2a1e76
INFO: Embedding func: 8 new workers initialized (Timeouts: Func: 30s, Worker: 60s, Health Check: 75s)
ERROR: Embedding func: Error in decorated function for task 2059954273536_242575.765: Vector count mismatch: expected 2 vectors but got 4 vectors (from embedding result).
ERROR: Traceback (most recent call last):
File "C:\Users\ProyectosIA\Paper2Slides\paper2slides\Lib\site-packages\lightrag\lightrag.py", line 1932, in process_document
await asyncio.gather(*first_stage_tasks)
File "C:\Users\ProyectosIA\Paper2Slides\paper2slides\Lib\site-packages\lightrag\kg\nano_vector_db_impl.py", line 124, in upsert
embeddings_list = await asyncio.gather(*embedding_tasks)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\ProyectosIA\Paper2Slides\paper2slides\Lib\site-packages\lightrag\utils.py", line 503, in call
result = await self.func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\ProyectosIA\Paper2Slides\paper2slides\Lib\site-packages\lightrag\utils.py", line 1016, in wait_func
return await future
^^^^^^^^^^^^
File "C:\Users\ProyectosIA\Paper2Slides\paper2slides\Lib\site-packages\lightrag\utils.py", line 720, in worker
result = await asyncio.wait_for(
^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\User\AppData\Local\Programs\Python\Python311\Lib\asyncio\tasks.py", line 479, in wait_for
return fut.result()
^^^^^^^^^^^^
File "C:\Users\ProyectosIA\Paper2Slides\paper2slides\Lib\site-packages\lightrag\utils.py", line 522, in call
raise ValueError(
ValueError: Vector count mismatch: expected 2 vectors but got 4 vectors (from embedding result).

ERROR: Failed to extract document 1/1: Informe CORTE 1.pdf
INFO: Enqueued document processing pipeline stopped
INFO: Text content insertion complete
INFO: Starting multimodal content processing...
INFO: Starting to process 10 multimodal content items
INFO: 127.0.0.1:61322 - "GET /api/status/5f2e7c7c-5ee9-420b-ac59-9c650f32d985 HTTP/1.1" 200 OK
INFO: Multimodal chunk generation progress: 1/10 (10.0%)
INFO: Multimodal chunk generation progress: 2/10 (20.0%)
INFO: Multimodal chunk generation progress: 3/10 (30.0%)
INFO: Multimodal chunk generation progress: 4/10 (40.0%)
INFO: Multimodal chunk generation progress: 5/10 (50.0%)
INFO: Multimodal chunk generation progress: 6/10 (60.0%)
INFO: Multimodal chunk generation progress: 7/10 (70.0%)
INFO: Multimodal chunk generation progress: 8/10 (80.0%)
INFO: Multimodal chunk generation progress: 9/10 (90.0%)
INFO: Multimodal chunk generation progress: 10/10 (100.0%)
INFO: Generated descriptions for 10/10 multimodal items using correct processors
ERROR: Embedding func: Error in decorated function for task 2059959360320_242609.39: Vector count mismatch: expected 10 vectors but got 20 vectors (from embedding result).
ERROR: Error storing chunks to storage: Vector count mismatch: expected 10 vectors but got 20 vectors (from embedding result).
ERROR: Error in multimodal processing: Vector count mismatch: expected 10 vectors but got 20 vectors (from embedding result).
WARNING: Falling back to individual multimodal processing
INFO: Processing item 1/10: image content
ERROR: Embedding func: Error in decorated function for task 2059959360896_242617.562: Vector count mismatch: expected 1 vectors but got 2 vectors (from embedding result).
ERROR: Error processing image content: Vector count mismatch: expected 1 vectors but got 2 vectors (from embedding result).
ERROR: Error processing multimodal content: not enough values to unpack (expected 3, got 2)
INFO: Processing item 2/10: discarded content

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions