-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Open
Labels
questionFurther information is requestedFurther information is requested
Description
Do you need to ask a question?
- I have searched the existing question and discussions and this question is not already answered.
- I believe this is a legitimate question, not just a bug or feature request.
Your Question
Hi there, I met some problems when conducting following file processing code:
for data_file in data_files:
index = data_file.split("/")[-1].split(".")[0]
working_dir = os.path.join(working_base_dir, f"rag_storage_{index}")
os.makedirs(working_dir, exist_ok=True)
rag_config = RAGAnythingConfig(
working_dir=working_dir,
parser="mineru", # Parser selection: mineru or docling
parse_method="auto", # Parse method: auto, ocr, or txt
enable_image_processing=True,
enable_table_processing=True,
enable_equation_processing=True,
)
# Initialize RAGAnything
rag = RAGAnything(
config=rag_config,
llm_model_func=llm_model_func,
vision_model_func=vision_model_func,
embedding_func=embedding_func,
)
# Process a document
TEST_DATA_PATHs = [data_file]
for file in TEST_DATA_PATHs:
try:
await rag.process_document_complete(
file_path=file,
output_dir=output_dir,
parse_method="auto",
backend="vlm-vllm-engine",
source="local",
)
except:
continueI found that the cache information from the first processed file is automatically carried over into the cache of the second processed file, and so on. In addition, the following log is printed when processing the first file, but it is not printed when processing the subsequent files.
lightrag - INFO - [_] Process 2051 KV load full_docs with 0 records
lightrag - INFO - [_] Process 2051 KV load text_chunks with 0 records
lightrag - INFO - [_] Process 2051 KV load full_entities with 0 records
lightrag - INFO - [_] Process 2051 KV load full_relations with 0 records
lightrag - INFO - [_] Process 2051 KV load entity_chunks with 0 records
lightrag - INFO - [_] Process 2051 KV load relation_chunks with 0 records
lightrag - INFO - [_] Process 2051 KV load llm_response_cache with 0 records
lightrag - INFO - [_] Process 2051 doc status load doc_status with 0 records
lightrag - INFO - [_] Process 2051 KV load parse_cache with 16 recordsHow should I modify my code so that these files do not interfere with each other during processing?
Additional Context
No response
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
questionFurther information is requestedFurther information is requested