Skip to content

Issue about the global PROMPT_INSTRUCTIONS and RESULT_SCHEMA variables in atlas_rag/kg_construction/triple_extraction.py #36

@dQw4w

Description

@dQw4w

I encountered a KeyError: 'entity_relation_dict' when running KnowledgeGraphExtractor with default settings after previously running it with a custom instructions and schema in the same session. It appears that the instructions and schema configuration rely on global variables that do not reset between runs, leading to a mismatch between the expected keys and the data being processed.

Steps to Reproduce:

  1. Declare a KnowledgeGraphExtractor instance with custom instructions and schema
  2. Without restarting the Python kernel, declare another KnowledgeGraphExtractor instance with default settings
  3. Run run_extraction() and convert_json_to_csv() on the KnowledgeGraphExtractor instance with default settings

Suggested Fix:
Move PROMPT_INSTRUCTIONS and RESULT_SCHEMA from the global module scope into the KnowledgeGraphExtractor class, and pass them as arguments when creating DatasetProcessor and CustomDataLoader instances. This would ensure that every new instance of the extractor starts with a clean, isolated configuration, preventing state contamination between runs.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions