From 97e3deb5b1a2d9bd9965e1a9186d085662ecda19 Mon Sep 17 00:00:00 2001 From: Mendon Kissling <59585235+mendonk@users.noreply.github.com> Date: Mon, 12 Aug 2024 13:19:58 -0400 Subject: [PATCH 1/7] fix-prereqs-link --- docs/modules/examples/pages/hotels-app.adoc | 2 +- docs/modules/examples/pages/langchain-unstructured-astra.adoc | 4 ++-- docs/modules/examples/pages/llama-astra.adoc | 4 ++-- docs/modules/examples/pages/llama-parse-astra.adoc | 4 ++-- docs/modules/examples/pages/mmr.adoc | 2 +- docs/modules/examples/pages/nvidia_embeddings.adoc | 2 +- examples/notebooks/FLARE.ipynb | 2 +- examples/notebooks/advancedRAG.ipynb | 2 +- examples/notebooks/langchain_evaluation.ipynb | 2 +- examples/notebooks/langchain_multimodal_gemini.ipynb | 2 +- examples/notebooks/llama-astra.ipynb | 2 +- examples/notebooks/nemo_guardrails.ipynb | 2 +- examples/notebooks/nvidia.ipynb | 2 +- examples/notebooks/quickstart.ipynb | 2 +- 14 files changed, 17 insertions(+), 17 deletions(-) diff --git a/docs/modules/examples/pages/hotels-app.adoc b/docs/modules/examples/pages/hotels-app.adoc index 03f56d772..c06869f3c 100644 --- a/docs/modules/examples/pages/hotels-app.adoc +++ b/docs/modules/examples/pages/hotels-app.adoc @@ -51,7 +51,7 @@ pip install -r requirements.txt npm --version ---- -See the https://docs.datastax.com/en/ragstack/docs/prerequisites.html[Prerequisites] page for more details on finding these values. +See the https://docs.datastax.com/en/ragstack/examples/prerequisites.html[Prerequisites] page for more details on finding these values. == Load the data diff --git a/docs/modules/examples/pages/langchain-unstructured-astra.adoc b/docs/modules/examples/pages/langchain-unstructured-astra.adoc index cf98f0f1c..7b83aa4dc 100644 --- a/docs/modules/examples/pages/langchain-unstructured-astra.adoc +++ b/docs/modules/examples/pages/langchain-unstructured-astra.adoc @@ -32,7 +32,7 @@ Install the following dependencies: ---- pip install ragstack-ai ---- -See the https://docs.datastax.com/en/ragstack/docs/prerequisites.html[Prerequisites] page for more details. +See the https://docs.datastax.com/en/ragstack/examples/prerequisites.html[Prerequisites] page for more details. == Set up your environment @@ -48,7 +48,7 @@ OPENAI_API_KEY=sk-... If you're using Google Colab, you'll be prompted for these values in the Colab environment. -See the https://docs.datastax.com/en/ragstack/docs/prerequisites.html[Prerequisites] page for more details. +See the https://docs.datastax.com/en/ragstack/examples/prerequisites.html[Prerequisites] page for more details. == Create RAG pipeline diff --git a/docs/modules/examples/pages/llama-astra.adoc b/docs/modules/examples/pages/llama-astra.adoc index 432c8456f..2ebe45369 100644 --- a/docs/modules/examples/pages/llama-astra.adoc +++ b/docs/modules/examples/pages/llama-astra.adoc @@ -22,7 +22,7 @@ Install the following dependencies: ---- pip install ragstack-ai python-dotenv ---- -See the https://docs.datastax.com/en/ragstack/docs/prerequisites.html[Prerequisites] page for more details. +See the https://docs.datastax.com/en/ragstack/examples/prerequisites.html[Prerequisites] page for more details. == Set up your local environment @@ -36,7 +36,7 @@ OPENAI_API_KEY=sk-... If you're using Google Colab, you'll be prompted for these values in the Colab environment. -See the https://docs.datastax.com/en/ragstack/docs/prerequisites.html[Prerequisites] page for more details. +See the https://docs.datastax.com/en/ragstack/examples/prerequisites.html[Prerequisites] page for more details. == Create a RAG pipeline with LlamaIndex diff --git a/docs/modules/examples/pages/llama-parse-astra.adoc b/docs/modules/examples/pages/llama-parse-astra.adoc index 3fc6a50c6..2bbf631b7 100644 --- a/docs/modules/examples/pages/llama-parse-astra.adoc +++ b/docs/modules/examples/pages/llama-parse-astra.adoc @@ -25,7 +25,7 @@ Install the following dependencies: ---- pip install ragstack-ai ---- -See the https://docs.datastax.com/en/ragstack/docs/prerequisites.html[Prerequisites] page for more details. +See the https://docs.datastax.com/en/ragstack/examples/prerequisites.html[Prerequisites] page for more details. == Set up your local environment @@ -40,7 +40,7 @@ OPENAI_API_KEY=sk-... If you're using Google Colab, you'll be prompted for these values in the Colab environment. -See the https://docs.datastax.com/en/ragstack/docs/prerequisites.html[Prerequisites] page for more details. +See the https://docs.datastax.com/en/ragstack/examples/prerequisites.html[Prerequisites] page for more details. == Create RAG pipeline diff --git a/docs/modules/examples/pages/mmr.adoc b/docs/modules/examples/pages/mmr.adoc index 60e8a76e5..3670f4607 100644 --- a/docs/modules/examples/pages/mmr.adoc +++ b/docs/modules/examples/pages/mmr.adoc @@ -38,7 +38,7 @@ OPENAI_API_KEY=sk-... pip install -qU ragstack-ai python-dotenv ---- + -See the https://docs.datastax.com/en/ragstack/docs/prerequisites.html[Prerequisites] page for more details. +See the https://docs.datastax.com/en/ragstack/examples/prerequisites.html[Prerequisites] page for more details. == Create embedding model and vector store diff --git a/docs/modules/examples/pages/nvidia_embeddings.adoc b/docs/modules/examples/pages/nvidia_embeddings.adoc index f9c713183..8de1705b1 100644 --- a/docs/modules/examples/pages/nvidia_embeddings.adoc +++ b/docs/modules/examples/pages/nvidia_embeddings.adoc @@ -40,7 +40,7 @@ pipeline. + `+datasets+` is used to import a sample dataset. + -See the https://docs.datastax.com/en/ragstack/docs/prerequisites.html[Prerequisites] page for more details. +See the https://docs.datastax.com/en/ragstack/examples/prerequisites.html[Prerequisites] page for more details. == Configure {db-serverless} and Nvidia NGC credentials diff --git a/examples/notebooks/FLARE.ipynb b/examples/notebooks/FLARE.ipynb index cc2e3ca2e..6866eb294 100644 --- a/examples/notebooks/FLARE.ipynb +++ b/examples/notebooks/FLARE.ipynb @@ -39,7 +39,7 @@ "* Get your Astra DB Endpoint: \n", " * `https://-.apps.astra.datastax.com`\n", "\n", - "See the [Prerequisites](https://docs.datastax.com/en/ragstack/docs/prerequisites.html) page for more details." + "See the [Prerequisites](https://docs.datastax.com/en/ragstack/examples/prerequisites.html) page for more details." ] }, { diff --git a/examples/notebooks/advancedRAG.ipynb b/examples/notebooks/advancedRAG.ipynb index 82f0ce3cb..fc21b6486 100644 --- a/examples/notebooks/advancedRAG.ipynb +++ b/examples/notebooks/advancedRAG.ipynb @@ -40,7 +40,7 @@ "* Within your database, create an [Astra DB Access Token](https://docs.datastax.com/en/astra-serverless/docs/manage/org/manage-tokens.html) with Database Administrator permissions.\n", "* Get your Astra DB Endpoint:\n", " * `https://-.apps.astra.datastax.com`\n", - "* See the [Prerequisites](https://docs.datastax.com/en/ragstack/docs/prerequisites.html) page for more details." + "* See the [Prerequisites](https://docs.datastax.com/en/ragstack/examples/prerequisites.html) page for more details." ] }, { diff --git a/examples/notebooks/langchain_evaluation.ipynb b/examples/notebooks/langchain_evaluation.ipynb index 7325bf451..e18656a37 100644 --- a/examples/notebooks/langchain_evaluation.ipynb +++ b/examples/notebooks/langchain_evaluation.ipynb @@ -33,7 +33,7 @@ " * `https://-.apps.astra.datastax.com`\n", "* A [LangSmith account](https://docs.smith.langchain.com/)\n", "\n", - "See the [Prerequisites](https://docs.datastax.com/en/ragstack/docs/prerequisites.html) page for more details." + "See the [Prerequisites](https://docs.datastax.com/en/ragstack/examples/prerequisites.html) page for more details." ] }, { diff --git a/examples/notebooks/langchain_multimodal_gemini.ipynb b/examples/notebooks/langchain_multimodal_gemini.ipynb index 963e51102..fde089ad0 100644 --- a/examples/notebooks/langchain_multimodal_gemini.ipynb +++ b/examples/notebooks/langchain_multimodal_gemini.ipynb @@ -47,7 +47,7 @@ " * `https://-.apps.astra.datastax.com`\n", "\n", "\n", - "See the [Prerequisites](https://docs.datastax.com/en/ragstack/docs/prerequisites.html) page for more details." + "See the [Prerequisites](https://docs.datastax.com/en/ragstack/examples/prerequisites.html) page for more details." ] }, { diff --git a/examples/notebooks/llama-astra.ipynb b/examples/notebooks/llama-astra.ipynb index aa1679dc9..459c87c91 100644 --- a/examples/notebooks/llama-astra.ipynb +++ b/examples/notebooks/llama-astra.ipynb @@ -40,7 +40,7 @@ "Get your Astra DB Endpoint:\n", "https://-.apps.astra.datastax.com\n", "\n", - "See the [Prerequisites](https://docs.datastax.com/en/ragstack/docs/prerequisites.html) page for more details.\n", + "See the [Prerequisites](https://docs.datastax.com/en/ragstack/examples/prerequisites.html) page for more details.\n", "\n", "## Setup" ] diff --git a/examples/notebooks/nemo_guardrails.ipynb b/examples/notebooks/nemo_guardrails.ipynb index f3f0fca00..b9ff2a129 100644 --- a/examples/notebooks/nemo_guardrails.ipynb +++ b/examples/notebooks/nemo_guardrails.ipynb @@ -37,7 +37,7 @@ "* Get your Astra DB Endpoint: \n", " * `https://-.apps.astra.datastax.com`\n", "\n", - "See the [Prerequisites](https://docs.datastax.com/en/ragstack/docs/prerequisites.html) page for more details." + "See the [Prerequisites](https://docs.datastax.com/en/ragstack/examples/prerequisites.html) page for more details." ] }, { diff --git a/examples/notebooks/nvidia.ipynb b/examples/notebooks/nvidia.ipynb index b22695620..9cbf47a40 100644 --- a/examples/notebooks/nvidia.ipynb +++ b/examples/notebooks/nvidia.ipynb @@ -32,7 +32,7 @@ " * Once signed in, navigate to `Catalog > AI Foundation Models > (Model)`\n", " * In the model page, select the `API` tab, then `Generate Key`\n", "\n", - "See the [Prerequisites](https://docs.datastax.com/en/ragstack/docs/prerequisites.html) page for more details." + "See the [Prerequisites](https://docs.datastax.com/en/ragstack/examples/prerequisites.html) page for more details." ] }, { diff --git a/examples/notebooks/quickstart.ipynb b/examples/notebooks/quickstart.ipynb index 23f4d34c9..73f8463e8 100644 --- a/examples/notebooks/quickstart.ipynb +++ b/examples/notebooks/quickstart.ipynb @@ -32,7 +32,7 @@ "* Get your Astra DB Endpoint: \n", " * `https://-.apps.astra.datastax.com`\n", "\n", - "See the [Prerequisites](https://docs.datastax.com/en/ragstack/docs/prerequisites.html) page for more details." + "See the [Prerequisites](https://docs.datastax.com/en/ragstack/examples/prerequisites.html) page for more details." ] }, { From f4663b96b581fdc9c7227d96259f30fecd125484 Mon Sep 17 00:00:00 2001 From: Brian Godsey Date: Mon, 3 Feb 2025 17:39:34 -0600 Subject: [PATCH 2/7] Making changes to the main graph RAG page. --- docs/modules/knowledge-graph/pages/index.adoc | 64 +++++-------------- 1 file changed, 15 insertions(+), 49 deletions(-) diff --git a/docs/modules/knowledge-graph/pages/index.adoc b/docs/modules/knowledge-graph/pages/index.adoc index 991b33d60..f380e81ec 100644 --- a/docs/modules/knowledge-graph/pages/index.adoc +++ b/docs/modules/knowledge-graph/pages/index.adoc @@ -1,19 +1,25 @@ = Introduction to Graph-Based Knowledge Extraction and Traversal -RAGStack offers two libraries supporting knowledge graph extraction and traversal, `ragstack-ai-knowledge-graph` and `ragstack-ai-knowledge-store`. +[IMPORTANT] +==== +The RAGStack knowledge graph libraries +`ragstack-ai-knowledge-graph` and `ragstack-ai-knowledge-store` +are no longer under development, and have been superseded by the +https://github.com/datastax/graph-rag[Graph RAG project]. -A knowledge graph represents information as **nodes**. Nodes are connected by **edges** indicating relationships between them. Each edge includes the source (for example, "Marie Curie" the person), the target ("Nobel Prize" the award) and a type, indicating how the source relates to the target (for example, “won”). +Please visit +https://github.com/datastax/graph-rag[Graph RAG project] +for the latest tools and techniques for working with +knowledge graphs and graph RAG. -A graph database isn't required to use the knowledge graph libraries - RAGStack uses Astra DB or Apache Cassandra to store and retrieve graphs. +If you have further questions, please contact +https://support.datastax.com/[DataStax Support]. -The `ragstack-ai-knowledge-graph` library offers **entity-centric** knowledge graph extraction and traversal. It extracts a knowledge graph from unstructured information and creates nodes from **entities**, or concepts (for example, "Seattle"). +==== -The `ragstack-ai-knowledge-store` library offers **content-centric** knowledge graph extraction and traversal. It extracts a knowledge graph from unstructured information and creates nodes from **content** (for example, a specific document about Seattle). -[IMPORTANT] -==== -This feature is currently under development and has not been fully tested. It is not supported for use in production environments. Please use this feature in testing and development environments only. -==== +A knowledge graph represents information as **nodes**. Nodes are connected by **edges** indicating relationships between them. Each edge includes the source (for example, "Marie Curie" the person), the target ("Nobel Prize" the award) and a type, indicating how the source relates to the target (for example, “won”). + == What's the difference between knowledge graphs and vector similarity search? @@ -32,46 +38,6 @@ The article's "see more information" is an example of an edge in a knowledge gra These edges also increase the diversity of results. Within the same tech support system, if you retrieve 100 chunks that are highly similar to the question, you have retrieved 100 chunks that are also highly similar to themselves. Following edges to linked information increases diversity. -== The `ragstack-ai-knowledge-graph` library - -The `ragstack-ai-knowledge-graph` library contains functions for the extraction and traversal of **entity-centric** knowledge graphs. - -To install the package, run: - -[source,bash] ----- -pip install ragstack-ai-knowledge-graph ----- - -To install the library as an extra with the RAGStack Langchain package, run: - -[source,bash] ----- -pip install "ragstack-ai-langchain[knowledge-graph]" ----- - -For more information, see xref:knowledge-graph.adoc[]. - -== The `ragstack-ai-knowledge-store` library - -The `ragstack-ai-knowledge-store` library contains functions for creating a **content-centric** vector-and-graph store. This store combines the benefits of vector stores with the context and relationships of a related edges. - -To install the package, run: - -[source,bash] ----- -pip install ragstack-ai-knowledge-store ----- - -To install the library as an extra with the RAGStack Langchain package, run: - -[source,bash] ----- -pip install "ragstack-ai-langchain[knowledge-store]" ----- - -For more information, see xref:knowledge-store.adoc[]. - From 977cb241a3a72523fe079558b6909ccef30bdd43 Mon Sep 17 00:00:00 2001 From: Brian Godsey Date: Mon, 3 Feb 2025 17:56:22 -0600 Subject: [PATCH 3/7] Moving usable content to the main grpah RAG page. --- docs/modules/knowledge-graph/pages/index.adoc | 37 ++++++- .../pages/knowledge-graph.adoc | 103 ------------------ .../pages/knowledge-store.adoc | 52 --------- 3 files changed, 35 insertions(+), 157 deletions(-) diff --git a/docs/modules/knowledge-graph/pages/index.adoc b/docs/modules/knowledge-graph/pages/index.adoc index f380e81ec..bae52f807 100644 --- a/docs/modules/knowledge-graph/pages/index.adoc +++ b/docs/modules/knowledge-graph/pages/index.adoc @@ -8,7 +8,7 @@ are no longer under development, and have been superseded by the https://github.com/datastax/graph-rag[Graph RAG project]. Please visit -https://github.com/datastax/graph-rag[Graph RAG project] +https://github.com/datastax/graph-rag[Graph RAG project on GitHub] for the latest tools and techniques for working with knowledge graphs and graph RAG. @@ -34,11 +34,44 @@ From a developer's perspective, a knowledge graph is built into a RAG pipeline s For example: consider a tech support system, where you find an article that is similar to your question, and it says. "If you have trouble with step 4, see this article for more information". Even if "more information" is not similar to your original question, it likely provides more information. -The article's "see more information" is an example of an edge in a knowledge graph. The edge connects the initial article to additional information, indicating that the two are related. This relationship would not be captured in a similarity search. +The article's HTML links can be examples of edges in a knowledge graph. These edges connect the initial article to additional information, indicating that they are related. This relationship would not be captured in a vector similarity search. These edges also increase the diversity of results. Within the same tech support system, if you retrieve 100 chunks that are highly similar to the question, you have retrieved 100 chunks that are also highly similar to themselves. Following edges to linked information increases diversity. +== How is Knowledge Graph RAG different from RAG? + +Short answer: it isn't. Knowledge graphs are a method of doing RAG, but with a different representation of the information. + +RAG with similarity search creates a vector representation of information based on chunks of text. The query is compared to the question, and the most similar chunks are returned as the answer. + +Knowledge graph RAG extracts a knowledge graph from information, and stores the graph representation in a vector or graph knowledge store. + +Instead of a similarity search query, the graph store is **traversed** to extract a sub-graph of the knowledge graph's edges and properties. For example, a query for "Marie Curie" returns a sub-graph of nodes representing her relationships, accomplishments, and other relevant information - the context. + +You're telling the graph store to "start with this node, and show me the relationships to a depth of 2 nodes outwards." + + +== What's the difference between entity-centric and content-centric knowledge graphs? + +**Entity-centric knowledge graphs** capture edge relationships between entities. +A knowledge graph is extracted with an LLM from unstructured information, and its entities and their edge relationships are stored in a vector or graph store. + +However, extracting this entity-centric knowledge graph from unstructured information is difficult, time-consuming, and error-prone. A user has to guide the LLM on the kinds of nodes and relationships to be extracted with a schema, and if the knowledge schema changes, the graph has to be processed again. The context advantages of entity-centric knowledge graphs are great, but the cost to build and maintain them is much higher than just chunking and embedding content to a vector store. + +**Content-centric knowledge graphs** offer a compromise between the ease and scalability of vector similarity search, and the context and relationships of entity-centric knowledge graphs. + +The content-centric approach starts with nodes that represent content (a specific document about Seattle), instead of concepts or entities (a node representing Seattle). A node may represent a table, an image, or a section of a document. Since the node represents the original content, the nodes are exactly what is stored when using vector search. + +Unstructured content is loaded, chunked, and written to a vector store. +Each chunk can be run through a variety of analyses to identify links. For example, links in the content may turn into `links_to edges`, and keywords may be extracted from the chunk to link up with other chunks on the same topic. + +To add edges, each chunk may be annotated with URLs that its content represents, or each chunk may be associated with keywords. + +Retrieval is where the benefits of vector search and content-centric traversal come together. +The query's initial starting points in the knowledge graph are identified based on vector similarity to the question, and then additional chunks are selected by following edges from that node. Including nodes that are related both by embedding distance (similarity) and graph distance (related) leads to a more diverse set of chunks with deeper context and less hallucinations. + + diff --git a/docs/modules/knowledge-graph/pages/knowledge-graph.adoc b/docs/modules/knowledge-graph/pages/knowledge-graph.adoc index e678fffe9..33d3cef15 100644 --- a/docs/modules/knowledge-graph/pages/knowledge-graph.adoc +++ b/docs/modules/knowledge-graph/pages/knowledge-graph.adoc @@ -1,31 +1,5 @@ = Knowledge Graph RAG -Knowledge Graph is a RAGStack library that provides graph-based representation and retrieval of information. It is designed to store and retrieve information in a way that is more efficient and accurate than vector-based similarity search over Document chunks. - -See the xref:examples:knowledge-graph.adoc[Knowledge graph example code] to get started using Knowledge Graph RAG. - -[IMPORTANT] -==== -This feature is currently under development and has not been fully tested. It is not supported for use in production environments. Please use this feature in testing and development environments only. -==== - -== The `ragstack-ai-knowledge-graph` library - -The `ragstack-ai-knowledge-graph` library contains functions for the extraction and traversal of **entity-centric** knowledge graphs. - -To install the package, run: - -[source,bash] ----- -pip install ragstack-ai-knowledge-graph ----- - -To install the library as an extra with the RAGStack Langchain package, run: - -[source,bash] ----- -pip install "ragstack-ai-langchain[knowledge-graph]" ----- == How is Knowledge Graph different from RAG? @@ -38,80 +12,3 @@ Knowledge graph RAG extracts a knowledge graph from information, and stores the Instead of a similarity search query, the graph store is **traversed** to extract a sub-graph of the knowledge graph's edges and properties. For example, a query for "Marie Curie" returns a sub-graph of nodes representing her relationships, accomplishments, and other relevant information - the context. You're telling the graph store to "start with this node, and show me the relationships to a depth of 2 nodes outwards." - -Here is how the xref:examples:knowledge-graph.adoc#query-graph-store[Knowledge graph example code] uses the Knowledge Graph library to extract a sub-graph around Marie Curie: - -[source,python] ----- -from ragstack_knowledge_graph.traverse import Node - -graph_store.as_runnable(steps=2).invoke(Node("Marie Curie", "Person")) ----- - -Result: - -[source,plain] ----- -{Marie Curie(Person) -> Chemist(Profession): HAS_PROFESSION, - Marie Curie(Person) -> French(Nationality): HAS_NATIONALITY, - Marie Curie(Person) -> Nobel Prize(Award): WON, - Marie Curie(Person) -> Physicist(Profession): HAS_PROFESSION, - Marie Curie(Person) -> Pierre Curie(Person): MARRIED_TO, - Marie Curie(Person) -> Polish(Nationality): HAS_NATIONALITY, - Marie Curie(Person) -> Professor(Profession): HAS_PROFESSION, - Marie Curie(Person) -> Radioactivity(Scientific concept): RESEARCHED, - Marie Curie(Person) -> Radioactivity(Scientific field): RESEARCHED_IN, - Marie Curie(Person) -> University Of Paris(Organization): WORKED_AT, - Pierre Curie(Person) -> Nobel Prize(Award): WON} ----- - -As with RAG, this sub-graph context is then dropped into the prompt to generate answers. - -[source,python] ----- -ANSWER_PROMPT = ( - "The original question is given below." - "This question has been used to retrieve information from a knowledge graph." - "The matching triples are shown below." - "Use the information in the triples to answer the original question.\n\n" - "Original Question: {question}\n\n" - "Knowledge Graph Triples:\n{context}\n\n" - "Response:" -) - -chain = ( - { "question": RunnablePassthrough() } - # extract_entities is provided by the Cassandra knowledge graph library - # and extracts entitise as shown above. - | RunnablePassthrough.assign(entities = extract_entities(llm)) - | RunnablePassthrough.assign( - # graph_store.as_runnable() is provided by the CassandraGraphStore - # and takes one or more entities and retrieves the relevant sub-graph(s). - triples = itemgetter("entities") | graph_store.as_runnable()) - | RunnablePassthrough.assign( - context = itemgetter("triples") | RunnableLambda(_combine_relations)) - | ChatPromptTemplate.from_messages([ANSWER_PROMPT]) - | llm -) ----- - -Result: - -[source,bash] ----- -Nodes: [Node(id='Marie Curie', type='Person'), Node(id='Polish', type='Nationality'), Node(id='French', type='Nationality'), Node(id='Physicist', type='Profession'), Node(id='Chemist', type='Profession'), Node(id='Radioactivity', type='Scientific concept'), Node(id='Nobel Prize', type='Award'), Node(id='Pierre Curie', type='Person'), Node(id='University Of Paris', type='Institution'), Node(id='Professor', type='Profession')] -Relationships: [Relationship(source=Node(id='Marie Curie', type='Person'), target=Node(id='Polish', type='Nationality'), type='HAS_NATIONALITY'), Relationship(source=Node(id='Marie Curie', type='Person'), target=Node(id='French', type='Nationality'), type='HAS_NATIONALITY'), Relationship(source=Node(id='Marie Curie', type='Person'), target=Node(id='Physicist', type='Profession'), type='IS_A'), Relationship(source=Node(id='Marie Curie', type='Person'), target=Node(id='Chemist', type='Profession'), type='IS_A'), Relationship(source=Node(id='Marie Curie', type='Person'), target=Node(id='Radioactivity', type='Scientific concept'), type='RESEARCHED'), Relationship(source=Node(id='Marie Curie', type='Person'), target=Node(id='Nobel Prize', type='Award'), type='WON'), Relationship(source=Node(id='Pierre Curie', type='Person'), target=Node(id='Nobel Prize', type='Award'), type='WON'), Relationship(source=Node(id='Marie Curie', type='Person'), target=Node(id='Pierre Curie', type='Person'), type='MARRIED_TO'), Relationship(source=Node(id='Marie Curie', type='Person'), target=Node(id='University Of Paris', type='Institution'), type='WORKED_AT'), Relationship(source=Node(id='Marie Curie', type='Person'), target=Node(id='Professor', type='Profession'), type='IS_A')] -Chain Response: content='Marie Curie was a physicist, chemist, and professor. She was of French and Polish nationality. She was married to Pierre Curie and both of them won the Nobel Prize. She worked at the University of Paris and researched radioactivity.' response_metadata={'token_usage': {'completion_tokens': 50, 'prompt_tokens': 308, 'total_tokens': 358}, 'model_name': 'gpt-4', 'system_fingerprint': None, 'finish_reason': 'stop', 'logprobs': None} id='run-79178e44-64a0-4077-8b90-f21fd004f745-0' ----- - -== Knowledge Graph, RAGStack, and Astra DB - -Knowledge graph extracts graphs from documents using the LLMGraphTransformer library from Langchain, stores the graphs in a Cassandra database, and traverses the graph to extract sub-graphs for answering questions with a https://github.com/datastax/ragstack-ai/blob/main/libs/knowledge-graph/ragstack_knowledge_graph/traverse.py[custom function]. - -A graph database or query language isn't required to use the knowledge graph library. - -Retrieving the sub-knowledge graph around a few nodes is a simple graph traversal, while graph DBs are designed for much more complex queries searching for paths with specific sequences of properties. Sub-knowledge graph traversal is often only to a depth of 2 or 3, since nodes which are farther removed become irrelevant to the question pretty quickly. This can be expressed as a few rounds of simple queries (one for each step) or a SQL join. - -Eliminating the need for a separate graph database makes it easier to use knowledge graphs. -Using Astra DB or Cassandra simplifies transactional writes to both the graph and other data stored in the same place, and likely scales better. -Finally, using RAGStack ensures Langchain components like LLMGraphTransformer remain stable. \ No newline at end of file diff --git a/docs/modules/knowledge-graph/pages/knowledge-store.adoc b/docs/modules/knowledge-graph/pages/knowledge-store.adoc index db777286f..1a1dadebb 100644 --- a/docs/modules/knowledge-graph/pages/knowledge-store.adoc +++ b/docs/modules/knowledge-graph/pages/knowledge-store.adoc @@ -1,54 +1,2 @@ = {graph-store} -{graph-store} is a hybrid vector-and-graph store that combines the benefits of vector stores with the context and relationships of related edges between chunks. - -See the xref:examples:knowledge-store.adoc[{graph-store} example code] to get started with {graph-store}. - -[IMPORTANT] -==== -This feature is currently under development and has not been fully tested. It is not supported for use in production environments. Please use this feature in testing and development environments only. -==== - -== The `ragstack-ai-knowledge-store` library - -The `ragstack-ai-knowledge-store` library contains functions for creating a hybrid vector-and-graph knowledge store. This store combines the benefits of vector stores with the context and relationships of a related edges. - -To install the package, run: - -[source,bash] ----- -pip install ragstack-ai-knowledge-store ----- - -To install the library as an extra with the RAGStack Langchain package, run: - -[source,bash] ----- -pip install "ragstack-ai-langchain[knowledge-store]" ----- - -== What's the difference between entity-centric and content-centric knowledge graphs? - -**Entity-centric knowledge graphs** (like xref:knowledge-graph.adoc[]) capture edge relationships between entities. -A knowledge graph is extracted with an LLM from unstructured information, and its entities and their edge relationships are stored in a vector or graph store. - -However, extracting this entity-centric knowledge graph from unstructured information is difficult, time-consuming, and error-prone. A user has to guide the LLM on the kinds of nodes and relationships to be extracted with a schema, and if the knowledge schema changes, the graph has to be processed again. The context advantages of entity-centric knowledge graphs are great, but the cost to build and maintain them is much higher than just chunking and embedding content to a vector store. - -**Content-centric knowledge graphs** (like xref:knowledge-store.adoc[]) offer a compromise between the ease and scalability of vector similarity search, and the context and relationships of entity-centric knowledge graphs. - -The content-centric approach starts with nodes that represent content (a specific document about Seattle), instead of concepts or entities (a node representing Seattle). A node may represent a table, an image, or a section of a document. Since the node represents the original content, the nodes are exactly what is stored when using vector search. - -Unstructured content is loaded, chunked, and written to a vector store. -Each chunk can be run through a variety of analyses to identify links. For example, links in the content may turn into `links_to edges`, and keywords may be extracted from the chunk to link up with other chunks on the same topic. - -To add edges, each chunk may be annotated with URLs that its content represents, or each chunk may be associated with keywords. - -Retrieval is where the benefits of vector search and content-centric traversal come together. -The query's initial starting points in the knowledge graph are identified based on vector similarity to the question, and then additional chunks are selected by following edges from that node. Including nodes that are related both by embedding distance (similarity) and graph distance (related) leads to a more diverse set of chunks with deeper context and less hallucinations. - -For a step-by-step example, see the xref:examples:knowledge-store.adoc[{graph-store} example code]. - - - - - From f05c1a726d5aa87f740134d0310851e4e9a29eb5 Mon Sep 17 00:00:00 2001 From: Brian Godsey Date: Mon, 3 Feb 2025 17:57:27 -0600 Subject: [PATCH 4/7] Deleting the two graph sub-pages after moving some content to the main graph page. --- .../knowledge-graph/pages/knowledge-graph.adoc | 14 -------------- .../knowledge-graph/pages/knowledge-store.adoc | 2 -- 2 files changed, 16 deletions(-) delete mode 100644 docs/modules/knowledge-graph/pages/knowledge-graph.adoc delete mode 100644 docs/modules/knowledge-graph/pages/knowledge-store.adoc diff --git a/docs/modules/knowledge-graph/pages/knowledge-graph.adoc b/docs/modules/knowledge-graph/pages/knowledge-graph.adoc deleted file mode 100644 index 33d3cef15..000000000 --- a/docs/modules/knowledge-graph/pages/knowledge-graph.adoc +++ /dev/null @@ -1,14 +0,0 @@ -= Knowledge Graph RAG - - -== How is Knowledge Graph different from RAG? - -Short answer: it isn't. Knowledge graphs are a method of doing RAG, but with a different representation of the information. - -RAG with similarity search creates a vector representation of information based on chunks of text. The query is compared to the question, and the most similar chunks are returned as the answer. - -Knowledge graph RAG extracts a knowledge graph from information, and stores the graph representation in a vector or graph knowledge store. - -Instead of a similarity search query, the graph store is **traversed** to extract a sub-graph of the knowledge graph's edges and properties. For example, a query for "Marie Curie" returns a sub-graph of nodes representing her relationships, accomplishments, and other relevant information - the context. - -You're telling the graph store to "start with this node, and show me the relationships to a depth of 2 nodes outwards." diff --git a/docs/modules/knowledge-graph/pages/knowledge-store.adoc b/docs/modules/knowledge-graph/pages/knowledge-store.adoc deleted file mode 100644 index 1a1dadebb..000000000 --- a/docs/modules/knowledge-graph/pages/knowledge-store.adoc +++ /dev/null @@ -1,2 +0,0 @@ -= {graph-store} - From 6c0f70efdbf4da5a1360db7429ff3fa66dd12da9 Mon Sep 17 00:00:00 2001 From: Mendon Kissling <59585235+mendonk@users.noreply.github.com> Date: Tue, 4 Feb 2025 09:16:26 -0500 Subject: [PATCH 5/7] remove-pages-from-nav --- docs/modules/ROOT/nav.adoc | 2 -- docs/modules/knowledge-graph/pages/index.adoc | 11 +++-------- 2 files changed, 3 insertions(+), 10 deletions(-) diff --git a/docs/modules/ROOT/nav.adoc b/docs/modules/ROOT/nav.adoc index de7f86ccb..1c2db9413 100644 --- a/docs/modules/ROOT/nav.adoc +++ b/docs/modules/ROOT/nav.adoc @@ -26,8 +26,6 @@ .Graph Libraries * xref:knowledge-graph:index.adoc[] -* xref:knowledge-graph:knowledge-graph.adoc[] -* xref:knowledge-graph:knowledge-store.adoc[] .Introduction to RAG * xref:intro-to-rag:index.adoc[] diff --git a/docs/modules/knowledge-graph/pages/index.adoc b/docs/modules/knowledge-graph/pages/index.adoc index bae52f807..59962e5eb 100644 --- a/docs/modules/knowledge-graph/pages/index.adoc +++ b/docs/modules/knowledge-graph/pages/index.adoc @@ -7,17 +7,12 @@ The RAGStack knowledge graph libraries are no longer under development, and have been superseded by the https://github.com/datastax/graph-rag[Graph RAG project]. -Please visit -https://github.com/datastax/graph-rag[Graph RAG project on GitHub] -for the latest tools and techniques for working with -knowledge graphs and graph RAG. - -If you have further questions, please contact -https://support.datastax.com/[DataStax Support]. +Please visit the https://github.com/datastax/graph-rag[Graph RAG project on GitHub] +for the latest tools and techniques for working with knowledge graphs and graph RAG. +If you have further questions, please contact https://support.datastax.com/[DataStax Support]. ==== - A knowledge graph represents information as **nodes**. Nodes are connected by **edges** indicating relationships between them. Each edge includes the source (for example, "Marie Curie" the person), the target ("Nobel Prize" the award) and a type, indicating how the source relates to the target (for example, “won”). From 5ee8c79c1ba045d8d0fa6c5d08a1f72df67384f2 Mon Sep 17 00:00:00 2001 From: Mendon Kissling <59585235+mendonk@users.noreply.github.com> Date: Tue, 4 Feb 2025 09:29:48 -0500 Subject: [PATCH 6/7] style-guide-cleanup-note --- docs/modules/knowledge-graph/pages/index.adoc | 10 +++------- 1 file changed, 3 insertions(+), 7 deletions(-) diff --git a/docs/modules/knowledge-graph/pages/index.adoc b/docs/modules/knowledge-graph/pages/index.adoc index 59962e5eb..a8d3fba2c 100644 --- a/docs/modules/knowledge-graph/pages/index.adoc +++ b/docs/modules/knowledge-graph/pages/index.adoc @@ -2,15 +2,11 @@ [IMPORTANT] ==== -The RAGStack knowledge graph libraries -`ragstack-ai-knowledge-graph` and `ragstack-ai-knowledge-store` -are no longer under development, and have been superseded by the -https://github.com/datastax/graph-rag[Graph RAG project]. +The `ragstack-ai-knowledge-graph` and `ragstack-ai-knowledge-store` libraries are no longer under development. -Please visit the https://github.com/datastax/graph-rag[Graph RAG project on GitHub] -for the latest tools and techniques for working with knowledge graphs and graph RAG. +Instead, you can find the latest tools and techniques for working with knowledge graphs and graph RAG in the https://github.com/datastax/graph-rag[Graph RAG project]. -If you have further questions, please contact https://support.datastax.com/[DataStax Support]. +If you have further questions, contact https://support.datastax.com/[DataStax Support]. ==== A knowledge graph represents information as **nodes**. Nodes are connected by **edges** indicating relationships between them. Each edge includes the source (for example, "Marie Curie" the person), the target ("Nobel Prize" the award) and a type, indicating how the source relates to the target (for example, “won”). From 2b08aef7dcf99b527276090802a6ff6c0d4ff9b5 Mon Sep 17 00:00:00 2001 From: Mendon Kissling <59585235+mendonk@users.noreply.github.com> Date: Tue, 4 Feb 2025 09:39:58 -0500 Subject: [PATCH 7/7] use-page-alias-instead-of-redirect --- docs/modules/knowledge-graph/pages/index.adoc | 1 + 1 file changed, 1 insertion(+) diff --git a/docs/modules/knowledge-graph/pages/index.adoc b/docs/modules/knowledge-graph/pages/index.adoc index a8d3fba2c..7d9602b07 100644 --- a/docs/modules/knowledge-graph/pages/index.adoc +++ b/docs/modules/knowledge-graph/pages/index.adoc @@ -1,4 +1,5 @@ = Introduction to Graph-Based Knowledge Extraction and Traversal +:page-aliases: knowledge-graph:knowledge-graph.adoc, knowledge-graph:knowledge-store.adoc [IMPORTANT] ====