I encountered a KeyError: 'entity_relation_dict' when running KnowledgeGraphExtractor with default settings after previously running it with a custom instructions and schema in the same session. It appears that the instructions and schema configuration rely on global variables that do not reset between runs, leading to a mismatch between the expected keys and the data being processed.
Steps to Reproduce:
- Declare a KnowledgeGraphExtractor instance with custom instructions and schema
- Without restarting the Python kernel, declare another KnowledgeGraphExtractor instance with default settings
- Run run_extraction() and convert_json_to_csv() on the KnowledgeGraphExtractor instance with default settings
Suggested Fix:
Move PROMPT_INSTRUCTIONS and RESULT_SCHEMA from the global module scope into the KnowledgeGraphExtractor class, and pass them as arguments when creating DatasetProcessor and CustomDataLoader instances. This would ensure that every new instance of the extractor starts with a clean, isolated configuration, preventing state contamination between runs.