Skip to content

[Question]: #176

@hexmSeeU

Description

@hexmSeeU

Do you need to ask a question?

  • I have searched the existing question and discussions and this question is not already answered.
  • I believe this is a legitimate question, not just a bug or feature request.

Your Question

Hi there, I met some problems when conducting following file processing code:

for data_file in data_files:
    index = data_file.split("/")[-1].split(".")[0]
    working_dir = os.path.join(working_base_dir, f"rag_storage_{index}")
    os.makedirs(working_dir, exist_ok=True)

    rag_config = RAGAnythingConfig(
        working_dir=working_dir,
        parser="mineru",  # Parser selection: mineru or docling
        parse_method="auto",  # Parse method: auto, ocr, or txt
        enable_image_processing=True,
        enable_table_processing=True,
        enable_equation_processing=True,
    )


    # Initialize RAGAnything
    rag = RAGAnything(
        config=rag_config,
        llm_model_func=llm_model_func,
        vision_model_func=vision_model_func,
        embedding_func=embedding_func,
    )


    # Process a document
    TEST_DATA_PATHs = [data_file]
    for file in TEST_DATA_PATHs:
        try:
            await rag.process_document_complete(
                file_path=file,
                output_dir=output_dir,
                parse_method="auto",
                backend="vlm-vllm-engine",
                source="local",
            )
        except:
            continue

I found that the cache information from the first processed file is automatically carried over into the cache of the second processed file, and so on. In addition, the following log is printed when processing the first file, but it is not printed when processing the subsequent files.

lightrag - INFO - [_] Process 2051 KV load full_docs with 0 records
lightrag - INFO - [_] Process 2051 KV load text_chunks with 0 records
lightrag - INFO - [_] Process 2051 KV load full_entities with 0 records
lightrag - INFO - [_] Process 2051 KV load full_relations with 0 records
lightrag - INFO - [_] Process 2051 KV load entity_chunks with 0 records
lightrag - INFO - [_] Process 2051 KV load relation_chunks with 0 records
lightrag - INFO - [_] Process 2051 KV load llm_response_cache with 0 records
lightrag - INFO - [_] Process 2051 doc status load doc_status with 0 records
lightrag - INFO - [_] Process 2051 KV load parse_cache with 16 records

How should I modify my code so that these files do not interfere with each other during processing?

Additional Context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions