You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Due to the recent support of Vector indexing in multiple graph databases, I would like to discuss the extension of the LangChain Graph class to utilize Vector indices.
Intro
Usually a graph contains multiple types of nodes and edges, consider a graph containing two types of Labels: Person and City and two types of relationships: KNOWS and LIVES_IN.
In addition each graph entity can be associated with a set of attributes which doesn't necessarily conform to a schema.
For example a Person node can have the following attributes: {First_Name, Last_Name, Age}
While another Person node might be associated with: {First_Name, Gender, Height}
When creating an index the user needs to provide both the entity Label / RelationshipType in addition to the attribute name he/she wishes to index. e.g. index the First_Name attribute of all Person nodes in the graph.
Following are thoughts about extending LangChain Graph capabilities with graph built-in vector index.
Index creation
When a graph is created from unstructured data, we have no idea which types of labels, relationship-types and attributes will form the graph, as such it is unclear which vector indices should be created.
One option could be to use the GraphDocument object which contains the original page_content and use it to create additional Document nodes in the graph e.g.
Where node is the node extracted from the Document, a vector index can be created over the page_content attribute of every Document node.
When a question is presented by a user we can use vector search to locate K semantically close Document nodes then reach all Mentioned nodes extracted from the documents.
On the other hand when a graph is created manually we assume the user has prior knowledge over its data and as such the user can make a decision regarding which entities should be indexed.
Index discovery
To understand which indices exist in a graph it seems reasonable to incorporate index information as part of the graph schema; the refresh_schema function can be extended to include a list of indices.
Index query
So far when analyzing graph data using LangChain QA chain, a user question is translated into a Cypher query using a LLM in the hope that:
A valid Cypher query is generated
The query confirms to the graph schema
The query will extract the necessary data from the graph
For some questions the above process is sufficient e.g. Which tasks are currently pending for over a week?
MATCH (t:Task) WHEREt.create_date>now() -7*24*60*60RETURNcount(t)
But for others a single graph query isn't sufficient, we can benefit from wide context extraction which starts by a vector search and continues with a 1 or 2 hops traversal.
Consider the question: Who succeeded king Henry the 6th?
Indeed this question can be answered by a simple graph query:
MATCH (:King{name:'Henry the 6th'})-[:successor]->(successor:King) RETURNsuccessor
But this query can easily fail if Henry the 6th is represented by the node:
(:King{name:'Henry vi'})
This is exactly where Vector search can help, as "Henry the 6th'' and "Henry vi" are semantically close.
If we combine the results from the original query (which didn't manage to produce any data) with the neighborhood of the K nodes returned from a Vector search over the original question, we have better chances of answering the user question correctly.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
Due to the recent support of Vector indexing in multiple graph databases, I would like to discuss the extension of the LangChain
Graph
class to utilize Vector indices.Intro
Usually a graph contains multiple types of nodes and edges, consider a graph containing two types of Labels:
Person
andCity
and two types of relationships:KNOWS
andLIVES_IN
.In addition each graph entity can be associated with a set of attributes which doesn't necessarily conform to a schema.
For example a
Person
node can have the following attributes:{First_Name, Last_Name, Age}
While another
Person
node might be associated with:{First_Name, Gender, Height}
When creating an index the user needs to provide both the entity Label / RelationshipType in addition to the attribute name he/she wishes to index. e.g. index the
First_Name
attribute of allPerson
nodes in the graph.Following are thoughts about extending LangChain Graph capabilities with graph built-in vector index.
Index creation
When a graph is created from unstructured data, we have no idea which types of labels, relationship-types and attributes will form the graph, as such it is unclear which vector indices should be created.
One option could be to use the
GraphDocument
object which contains the originalpage_content
and use it to create additionalDocument
nodes in the graph e.g.Where
node
is the node extracted from theDocument
, a vector index can be created over thepage_content
attribute of everyDocument
node.When a question is presented by a user we can use vector search to locate
K
semantically closeDocument
nodes then reach all Mentioned nodes extracted from the documents.On the other hand when a graph is created manually we assume the user has prior knowledge over its data and as such the user can make a decision regarding which entities should be indexed.
Index discovery
To understand which indices exist in a graph it seems reasonable to incorporate index information as part of the graph schema; the
refresh_schema
function can be extended to include a list of indices.Index query
So far when analyzing graph data using LangChain QA chain, a user question is translated into a Cypher query using a LLM in the hope that:
For some questions the above process is sufficient e.g. Which tasks are currently pending for over a week?
But for others a single graph query isn't sufficient, we can benefit from wide context extraction which starts by a vector search and continues with a 1 or 2 hops traversal.
Consider the question: Who succeeded king Henry the 6th?
Indeed this question can be answered by a simple graph query:
But this query can easily fail if Henry the 6th is represented by the node:
This is exactly where Vector search can help, as "Henry the 6th'' and "Henry vi" are semantically close.
If we combine the results from the original query (which didn't manage to produce any data) with the neighborhood of the K nodes returned from a Vector search over the original question, we have better chances of answering the user question correctly.
Beta Was this translation helpful? Give feedback.
All reactions