Add metadata to entites, relationship, covariate and report #810

hyiip · 2024-08-02T19:25:56Z

hyiip
Aug 2, 2024

Using a csv, I knew I can add extra columns to documents which serves as a metadata, then I can refer to when tracking the citation. But is there a way to add metadata to entity, relationship, covariate and report?

My use case is to add a certain "access level" to each document, if this metadata can transfer into entity, relationship and so on, I hoped I can ignore the context if user did not have enough access level to the context. I see that I can mix context during local search or global search using context_builder, so it may be a starting point to implement an "access level" check, or more general, metadata check in the context_builder. Then the question became how do we add metadata to entity, relationship, covariate and report.

Answered by natoverse

Aug 6, 2024

We do this quite frequently - most commonly it is because we have chunked a document but still want the document title, etc. to be on each chunk so that any LLM summarizations have that context. Our approach is to pre-chunk the content with a script before running GraphRAG and write each chunk out to its own text file. If you do this, just ensure that your final chunk document tokens remain below the GRAPH_CHUNK_SIZE setting (default 1200). Then when you run GraphRAG it will find that all documents are already chunked, so your chunks will remain intact. Also note that when GraphRAG does chunking it includes a token overlap to ensure good coverage, so you might want to take that into accou…

View full answer

natoverse · 2024-08-06T23:28:07Z

natoverse
Aug 6, 2024
Maintainer

We do this quite frequently - most commonly it is because we have chunked a document but still want the document title, etc. to be on each chunk so that any LLM summarizations have that context. Our approach is to pre-chunk the content with a script before running GraphRAG and write each chunk out to its own text file. If you do this, just ensure that your final chunk document tokens remain below the GRAPH_CHUNK_SIZE setting (default 1200). Then when you run GraphRAG it will find that all documents are already chunked, so your chunks will remain intact. Also note that when GraphRAG does chunking it includes a token overlap to ensure good coverage, so you might want to take that into account in your script.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add metadata to entites, relationship, covariate and report #810

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Add metadata to entites, relationship, covariate and report #810

Uh oh!

hyiip Aug 2, 2024

Replies: 1 comment

Uh oh!

natoverse Aug 6, 2024 Maintainer

hyiip
Aug 2, 2024

natoverse
Aug 6, 2024
Maintainer