I have got some problems with the embedding part.

What is happening here?

INFO: Parsing C:\Users\ProyectosIA\Paper2Slides\sources\uploads\5f2e7c7c-5ee9-420b-ac59-9c650f32d985\Informe CORTE 1.pdf complete! Extracted 30 content blocks
INFO: Stored parsing result in cache: da3d01e45d6871b07205e92a6ae3a9c8
INFO:
Content Information:
INFO: * Total blocks in content_list: 30
INFO: * Content block types:
INFO:   - text: 20
INFO:   - image: 4
INFO:   - discarded: 6
INFO: Content separation complete:
INFO:   - Text content length: 7206 characters
INFO:   - Multimodal items count: 10
INFO:   - Multimodal type distribution: {'image': 4, 'discarded': 6}
INFO: Setting content source for context-aware multimodal processing...
INFO: Content source set with format: minerU
INFO: Content source set with format: minerU
INFO: Content source set with format: minerU
INFO: Content source set with format: minerU
INFO: Content source set for context extraction (format: minerU)
INFO: Starting text content insertion into LightRAG...
WARNING: Duplicate document detected: doc-e18c3ce4c296ad7f6a875dfacd2a1e76 (Informe CORTE 1.pdf)
INFO: Created 1 duplicate document records with track_id: insert_20260316_145544_d8fc6c5f
WARNING: No new unique documents were found.
INFO: Preserving 5 failed document entries for manual review
INFO: Reset 1 documents from PROCESSING/FAILED to PENDING status
INFO: Processing 1 document(s)
INFO: Extracting stage 1/1: Informe CORTE 1.pdf
INFO: Processing d-id: doc-e18c3ce4c296ad7f6a875dfacd2a1e76
INFO: Embedding func: 8 new workers initialized (Timeouts: Func: 30s, Worker: 60s, Health Check: 75s)
ERROR: Embedding func: Error in decorated function for task 2059954273536_242575.765: Vector count mismatch: expected 2 vectors but got 4 vectors (from embedding result).
ERROR: Traceback (most recent call last):
  File "C:\Users\ProyectosIA\Paper2Slides\paper2slides\Lib\site-packages\lightrag\lightrag.py", line 1932, in process_document
    await asyncio.gather(*first_stage_tasks)
  File "C:\Users\ProyectosIA\Paper2Slides\paper2slides\Lib\site-packages\lightrag\kg\nano_vector_db_impl.py", line 124, in upsert
    embeddings_list = await asyncio.gather(*embedding_tasks)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\ProyectosIA\Paper2Slides\paper2slides\Lib\site-packages\lightrag\utils.py", line 503, in __call__
    result = await self.func(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\ProyectosIA\Paper2Slides\paper2slides\Lib\site-packages\lightrag\utils.py", line 1016, in wait_func
    return await future
           ^^^^^^^^^^^^
  File "C:\Users\ProyectosIA\Paper2Slides\paper2slides\Lib\site-packages\lightrag\utils.py", line 720, in worker
    result = await asyncio.wait_for(
             ^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\User\AppData\Local\Programs\Python\Python311\Lib\asyncio\tasks.py", line 479, in wait_for
    return fut.result()
           ^^^^^^^^^^^^
  File "C:\Users\ProyectosIA\Paper2Slides\paper2slides\Lib\site-packages\lightrag\utils.py", line 522, in __call__
    raise ValueError(
ValueError: Vector count mismatch: expected 2 vectors but got 4 vectors (from embedding result).

ERROR: Failed to extract document 1/1: Informe CORTE 1.pdf
INFO: Enqueued document processing pipeline stopped
INFO: Text content insertion complete
INFO: Starting multimodal content processing...
INFO: Starting to process 10 multimodal content items
INFO:     127.0.0.1:61322 - "GET /api/status/5f2e7c7c-5ee9-420b-ac59-9c650f32d985 HTTP/1.1" 200 OK
INFO: Multimodal chunk generation progress: 1/10 (10.0%)
INFO: Multimodal chunk generation progress: 2/10 (20.0%)
INFO: Multimodal chunk generation progress: 3/10 (30.0%)
INFO: Multimodal chunk generation progress: 4/10 (40.0%)
INFO: Multimodal chunk generation progress: 5/10 (50.0%)
INFO: Multimodal chunk generation progress: 6/10 (60.0%)
INFO: Multimodal chunk generation progress: 7/10 (70.0%)
INFO: Multimodal chunk generation progress: 8/10 (80.0%)
INFO: Multimodal chunk generation progress: 9/10 (90.0%)
INFO: Multimodal chunk generation progress: 10/10 (100.0%)
INFO: Generated descriptions for 10/10 multimodal items using correct processors
ERROR: Embedding func: Error in decorated function for task 2059959360320_242609.39: Vector count mismatch: expected 10 vectors but got 20 vectors (from embedding result).
ERROR: Error storing chunks to storage: Vector count mismatch: expected 10 vectors but got 20 vectors (from embedding result).
ERROR: Error in multimodal processing: Vector count mismatch: expected 10 vectors but got 20 vectors (from embedding result).
WARNING: Falling back to individual multimodal processing
INFO: Processing item 1/10: image content
ERROR: Embedding func: Error in decorated function for task 2059959360896_242617.562: Vector count mismatch: expected 1 vectors but got 2 vectors (from embedding result).
ERROR: Error processing image content: Vector count mismatch: expected 1 vectors but got 2 vectors (from embedding result).
ERROR: Error processing multimodal content: not enough values to unpack (expected 3, got 2)
INFO: Processing item 2/10: discarded content

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

I have got some problems with the embedding part. #32

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

I have got some problems with the embedding part. #32

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions