Fuze Graphs and Vectors #11851

swilly22 · 2023-10-16T08:18:29Z

swilly22
Oct 16, 2023

Due to the recent support of Vector indexing in multiple graph databases, I would like to discuss the extension of the LangChain Graph class to utilize Vector indices.

Intro

Usually a graph contains multiple types of nodes and edges, consider a graph containing two types of Labels: Person and City and two types of relationships: KNOWS and LIVES_IN.

(:Person)-[:LIVES_IN]->(:City)
(:Person)-[:KNOWS]->(:Person)

In addition each graph entity can be associated with a set of attributes which doesn't necessarily conform to a schema.
For example a Person node can have the following attributes:
{First_Name, Last_Name, Age}

While another Person node might be associated with:
{First_Name, Gender, Height}

When creating an index the user needs to provide both the entity Label / RelationshipType in addition to the attribute name he/she wishes to index. e.g. index the First_Name attribute of all Person nodes in the graph.

Following are thoughts about extending LangChain Graph capabilities with graph built-in vector index.

Index creation

When a graph is created from unstructured data, we have no idea which types of labels, relationship-types and attributes will form the graph, as such it is unclear which vector indices should be created.

One option could be to use the GraphDocument object which contains the original page_content and use it to create additional Document nodes in the graph e.g.

(doc:Document {page_content:$page_content})-[:MENTIONS]->(node)

Where node is the node extracted from the Document, a vector index can be created over the page_content attribute of every Document node.

When a question is presented by a user we can use vector search to locate K semantically close Document nodes then reach all Mentioned nodes extracted from the documents.

On the other hand when a graph is created manually we assume the user has prior knowledge over its data and as such the user can make a decision regarding which entities should be indexed.

Index discovery

To understand which indices exist in a graph it seems reasonable to incorporate index information as part of the graph schema; the refresh_schema function can be extended to include a list of indices.

Index query

So far when analyzing graph data using LangChain QA chain, a user question is translated into a Cypher query using a LLM in the hope that:

A valid Cypher query is generated
The query confirms to the graph schema
The query will extract the necessary data from the graph

For some questions the above process is sufficient e.g. Which tasks are currently pending for over a week?

MATCH (t:Task) WHERE t.create_date > now() - 7*24*60*60 RETURN count(t)

But for others a single graph query isn't sufficient, we can benefit from wide context extraction which starts by a vector search and continues with a 1 or 2 hops traversal.

Consider the question: Who succeeded king Henry the 6th?
Indeed this question can be answered by a simple graph query:

MATCH (:King {name:'Henry the 6th'})-[:successor]->(successor:King) RETURN successor

But this query can easily fail if Henry the 6th is represented by the node:

(:King {name:'Henry vi'})

This is exactly where Vector search can help, as "Henry the 6th'' and "Henry vi" are semantically close.
If we combine the results from the original query (which didn't manage to produce any data) with the neighborhood of the K nodes returned from a Vector search over the original question, we have better chances of answering the user question correctly.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fuze Graphs and Vectors #11851

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Fuze Graphs and Vectors #11851

Uh oh!

swilly22 Oct 16, 2023

Intro

Index creation

Index discovery

Index query

Replies: 0 comments

swilly22
Oct 16, 2023