neo4j-labs
diff --git a/‎.gitignore‎
Lines changed: 2 additions & 0 deletions b/‎.gitignore‎
Lines changed: 2 additions & 0 deletions
diff --git a/‎POC_Documents/V1/Local-to-global-genAI_GraphRAG_V1‎
Lines changed: 70 additions & 0 deletions b/‎POC_Documents/V1/Local-to-global-genAI_GraphRAG_V1‎
Lines changed: 70 additions & 0 deletions
diff --git a/‎POC_Documents/V1/Local_to_global poc v1.1‎
Lines changed: 51 additions & 0 deletions b/‎POC_Documents/V1/Local_to_global poc v1.1‎
Lines changed: 51 additions & 0 deletions
diff --git a/‎POC_Documents/V1/RAPTOR_RECURSIVE ABSTRACTIVE PROCESSING v1‎
Lines changed: 56 additions & 0 deletions b/‎POC_Documents/V1/RAPTOR_RECURSIVE ABSTRACTIVE PROCESSING v1‎
Lines changed: 56 additions & 0 deletions
diff --git a/‎POC_Documents/V1/figure.2,3.jpg‎
83.2 KB b/‎POC_Documents/V1/figure.2,3.jpg‎
83.2 KB
diff --git a/‎POC_Documents/V1/figure.4.jpg‎
133 KB b/‎POC_Documents/V1/figure.4.jpg‎
133 KB
diff --git a/‎POC_Experiments/Data_Analysis‎
Lines changed: 1 addition & 0 deletions b/‎POC_Experiments/Data_Analysis‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎backend/Dockerfile‎
Lines changed: 2 additions & 1 deletion b/‎backend/Dockerfile‎
Lines changed: 2 additions & 1 deletion
diff --git a/‎backend/example.env‎
Lines changed: 3 additions & 1 deletion b/‎backend/example.env‎
Lines changed: 3 additions & 1 deletion
diff --git a/‎backend/requirements.txt‎
Lines changed: 4 additions & 1 deletion b/‎backend/requirements.txt‎
Lines changed: 4 additions & 1 deletion
@@ -165,3 +165,5 @@ google-cloud-cli-469.0.0-linux-x86_64.tar.gz
 /data/llm-experiments-387609-c73d512ca3b1.json
 /backend/src/merged_files
 /backend/src/chunks
+/backend/merged_files
+google-cloud-cli-476.0.0-linux-x86_64.tar.gz
@@ -0,0 +1,70 @@
+Graph DB Connectors and GenAI Integrations POC_v1
+
+“” This is the version v1, where main content is taken with the below paper1 and some other contents are added with other papers and blogs.””
+
+Paper1: From Local to Global: A Graph RAG Approach to
+Query-Focused Summarization
+
+“” Python-based implementation of both global and local Graph RAG
+approaches are forthcoming at https://aka.ms/graphrag.””
+
+Graph RAG approach uses the natural modularity of graphs to partition data for global summarization. It uses an LLM to build a graph-based text index in two stages:
+1.	Derive an entity knowledge graph from the source documents, 
+2.	Pre-generate community summaries for all groups of closely related entities.
+
+IT can answer such questions like “What are the main themes in the dataset?”, Basically, it is an inherently query focused summarization (QFS) task. Graph RAG approach improves the question answering over private text corpora that scales with both the generality of user questions and the quantity of source text to be indexed. Graph RAG leads to substantial improvements for both the comprehensiveness and diversity of generated answers.
+
+community descriptions provide complete coverage of the underlying graph index, and the input documents it represents. Query-focused summarization of an entire corpus is then made possible using a map-reduce approach: first using each community summary to answer the query independently and in parallel, then summarizing all relevant partial answers into a final global answer.
+
+ 
+        Figure 1: Graph RAG pipeline using an LLM-derived graph index of source document text
+
+I.  Data Ingestion:
+1.	Documents/chunks/Text Preprocessing: 
+                       To reduce document size and improve latency use text summarization for heavy documents or multi-documents with the below steps:
+                               Step1: LLM (use a specific LLM embedding to summarize documents)
+                             Step2: Knowledge graph to reduce size with entities, relationship, and their properties with subgraphs.
+Note: Above steps can be followed bidirectionally.
+
+2.	Create vector DB/Embedding/Indexing with LLM embedding.
+
+II. vector Embedding/Indexing Storage
+
+Generate KG with the embedding and store in graph DB or store the embedding in FAISS/pinecone to improve latency and accuracy.
+or
+Both the methods can be combined (KG+ vector embedding) and store DB to handle both structure and unstructured data 
+
+Generate   four communities (C0, C1, C2, C3) Graph RAG summary from the embedding/KG of the document/multi-documents/embeddings by using text summarized Map Reduced approach. 
+C0: Uses root-level community summaries (fewest in number) to answer user queries.
+C1: Uses high-level community summaries to answer queries. These are sub-communities.
+of C0, if present, otherwise C0 communities projected down.
+C2: Uses intermediate-level community summaries to answer queries. These are subcommunities of C1, if present, otherwise C1 communities projected down.
+C3: Uses low-level community summaries (greatest in number) to answer queries. These
+ are sub-communities of C2, if present, otherwise C2 communities projected down.
+                                                                                         
+Figure 2.1 Communities’ Summary                                          Figure.2.2 Communities Graph 
+ 
+Figure 3. Summarized Community Graph
+
+III. Chat Response/Architecture:
+Approaches: Multi-hope RAG, memory-based response, Head-to-Head measures.
+Head-To-Head measures can use as a performance metrics using an LLM evaluator are as follows:
+• Comprehensiveness: How much detail does the answer provide to cover all aspects and
+details of the question?
+• Diversity: How varied and rich is the answer in providing different perspectives and insights on the question?
+
+
+For a given community levels (Fig.2.2,2.2 & 3), the global answer to any user query is generated as follows:
+
+• Prepare community summaries. Community summaries are randomly shuffled and divided
+into chunks of pre-specified token size. This ensures relevant information is distributed
+across chunks, rather than concentrated (and potentially lost) in a single context window.
+
+•  Map community answers. Generate intermediate answers in parallel, one for each chunk.
+The LLM is also asked to generate a score between 0-100 indicating how helpful the generated
+answer is in answering the target question. Answers with score 0 are filtered out.
+
+• Reduce to global answer. Intermediate community answers are sorted in descending order
+of helpfulness score and iteratively added into a new context window until the token limit
+is reached. This final context is used to generate the global answer returned to the user.
+
@@ -0,0 +1,51 @@
+From Local to Global V1.1
+“” Paper1: From Local to Global: A Graph RAG Approach to Query-Focused Summarization””
+
+1.	Graph RAG Approach & Pipeline (Figure1_v1):
+1. Source document- -> Text chunks (token size: 600-2400)
+2. Text Chunks- Elements Instances:
+•	To domain the document for context learning, use multipart LLM prompt to Identify and extract instances for the graph nodes and edges including source and target from each chunk of source document. 
+i.	Text chunk  Multilevel LLM prompt (to domain the document for context learning) find all entities (name, type, description (including source & target))  identify instances for nodes & edges.  
+ii.	Abstractive Summary/Generate tuple ((source/object entities, name, type, description of entities/relationships/claims), (all entities, name, type, description of entities/relationships/claims))
+Note: It also supports a secondary extraction prompt for any additional covariates       associated with the extracted node instances. It extracts the claim linked to detected entities, including the subject, object, type, description, source text span, and start and end dates. And make sure no entities are left in multistage LLM prompt by asking YES (many entities are left)/NO.
+ 3.Element instances  Element Summaries:
+•	Element instances/Abstractive Summary(tuples) LLM to summarize Semantic instance level Element Summary
+•	To convert all such instance-level summaries into single blocks of descriptive text for each graph element (i.e., entity node, relationship edge, and claim covariate) requires a further round of LLM summarization over matching groups of instances. i.e.  
+         Instance-level-summaryLLM Vector embedding  KNN/cosine similarity search to find homogeneous summary clusters  LLM to summarize homogeneous clusters  single block summary of similar instances Elements’/homogenous clusters’ summary.
+Note: potential concern at this stage is that the LLM may not consistently extract references to the same entity in the same text format, resulting in duplicate entity elements and thus duplicate nodes in the entity graph. However, since all closely-related “communities” of entities will be detected and summarized in the following step, and given that LLMs can understand the common entity. There should be sufficient connectivity from all variations to a shared set of closely-related entities.
+
+
+            4. Element summariesGraph Communities:
+i.	Indexed Element summary of homogeneous clusters neo4j  Homogeneous weighted undirected Graph
+ii.	Homogeneous weighted undirected Graph Graph Community Detection algorithms (Hierarchical community structure) Partition graph into communities of nodes
+Note: To recover hierarchical community structure of large-scale graphs efficiently. Each level of this hierarchy provides a community partition that covers the nodes of the graph in a mutually-exclusive, collective-exhaustive way, enabling divide-and-conquer global summarization.
+
+
+5. Graph Communities  Communities Summaries:
+•	Graph communitiesLieden Hierarchy method  community summaries (on global summarized graph summaries 
+Graph based communities are used to generate Communities’ summaries. These communities’ summaries are independently useful to understand the global structure and semantics of the dataset and may themselves be used to make sense of a corpus in the absence of a question. For example, a user may scan through community summaries at one level looking for general themes of interest, then follow links to the reports at the lower level that provide more details for each of the subtopics. 
+•	Leaf-level communities. The element summaries of a leaf-level community (nodes, edges, covariates) are prioritized and then iteratively added to the LLM context window until the token limit is reached. The prioritization is as follows: for each community edge in decreasing order of combined source and target node degree (i.e., overall prominence), add descriptions of the source node, target node, linked covariates, and the edge itself. 
+•	Higher-level communities. If all element summaries fit within the token limit of the context window, proceed as for leaf-level communities, and summarize all element summaries within the community. Otherwise, rank sub-communities in decreasing order of element summary tokens and iteratively substitute sub-community summaries (shorter) for their associated element summaries (longer) until fit within the context window is achieved.
+
+        6. Community Summaries Community AnswersGlobal Answers:
+a.	For a given community level, the global answer to any user query is generated as follows:
+Divide randomly shuffled Community summary into chunks (community summaries preparation)  parallelly generates answers from each chunk (Answer mapping)  Reduce to global answer.
+•	Prepare community summaries: community summaries are randomly shuffled and divided into chunks of pre-specified token size. This ensures relevant information is distributed across chunks, rather than concentrated (and potentially lost) in a single context window. 
+•	 Map community answers: Generate intermediate answers in parallel, one for each chunk. The LLM is also asked to generate a score between 0-100 indicating how helpful the generated answer is in answering the target question. Answers with score 0 are filtered out.
+•	 Reduce to global answer: Intermediate community answers are sorted in descending order of helpfulness score and iteratively added into a new context window until the token limit is reached. This final context is used to generate the global answer returned to the user.
+
+
+
+•	Communities Comparison
+Six conditions can compare six, including Graph RAG using four levels of graph communities (C0, C1, C2, C3), a text summarization method applying our map-reduce approach directly to source texts (TS), and a naive “semantic search” RAG approach (SS):
+a)	CO: Uses root-level community summaries (fewest in number) to answer user queries.
+b)	 C1: Uses high-level community summaries to answer queries. These are sub-communities of C0, if present, otherwise C0 communities projected down.
+c)	 C2: Uses intermediate-level community summaries to answer queries. These are subcommunities of C1, if present, otherwise C1 communities projected down.
+d)	 C3: Uses low-level community summaries (greatest in number) to answer queries. These are sub-communities of C2, if present, otherwise C2 communities projected down.
+e)	 TS: Except source texts (rather than community summaries) are shuffled and chunked for the map-reduce summarization stages.
+f)	 SS: An implementation of naive RAG in which text chunks are retrieved and added to the available context window until the specified token limit is reached.
+
+The size of the context window and the prompts used for answer generation are the same    across all six conditions (except for minor modifications to reference styles to match the types of contexts.
+
+Conclusion: A Trade-offs of building a graph index achieves the best head-to-head results against other methods, but in many cases the graph-free approach to global summarization of source texts performed competitively. The real-world decision about whether to invest in building a graph index depends on multiple factors, including the compute budget, expected number of lifetime queries per dataset, and value obtained from other aspects of the graph index (including the generic communities’ summaries and the use of other graph-related RAG approaches).
+     
@@ -0,0 +1,56 @@
+RAPTOR: RECURSIVE ABSTRACTIVE PROCESSING
+FOR TREE-ORGANIZED RETRIEVAL v1:
+
+Source Code The source code for RAPTOR will be publicly available in https://github.com/parthsarthi03/raptor.
+ 
+Step1. Documentchunks(100)(to preserve the contextual and semantic coherence, if sentence exceed 100 tokens the whole sentence moves to the next chunk rather than cutting into mid)  clustered  summarized(GPT3.5-turbo) re-embedding(SBERT)
+Leafe nodes have two values: chunk and its embedding using SBERT
+Step2. Follow step1 until further clustering becomes infeasible, resulting in a structured, multi-layered tree representation of the original documents.
+Note: Scalability of both build time and token expenditure is shown below
+ 
+Step3: For querying within the tree, two distinct strategies are used: tree traversal and collapsed tree. The tree traversal method traverses the tree layer-by-layer, pruning and selecting the most relevant nodes at each level. The collapsed tree method evaluates nodes collectively across all layers to find the most relevant ones.
+
+Clustering Algorithm:
+GMM: It offers both flexibility and a probabilistic framework. where nodes can belong to multiple clusters without requiring a fixed number of clusters. This flexibility is essential because individual text segments often contain information relevant to various topics, thereby warranting their inclusion in multiple summaries.
+The high dimensionality of vector embeddings presents a challenge for traditional GMMs, as distance metrics may behave poorly when used to measure similarity in high-dimensional spaces. To mitigate this, we employ Uniform Manifold Approximation and Projection (UMAP), a manifold learning technique for dimensionality reduction . The number of nearest neighbors’ parameter, n neighbors, in UMAP determines the balance between the preservation of local and global structures. Our algorithm varies n neighbors to create a hierarchical clustering structure: it first identifies global clusters and then performs local clustering within these global clusters. This two-step clustering process captures a broad spectrum of relationships among the text data, from broad themes to specific details.
+Should a local cluster’s combined context ever exceed the summarization model’s token threshold, our algorithm recursively applies clustering within the cluster, ensuring that the context remains within the token threshold.
+In GMM, the number of parameters k is a function of the dimensionality of the input vectors and the number of clusters.
+With the optimal number of clusters determined by BIC, the Expectation-Maximization algorithm is then used to estimate the GMM parameters, namely the means, covariances, and mixture weights. While the Gaussian assumption in GMMs may not perfectly align with the nature of text data, which often exhibits a sparse and skewed distribution, our empirical observations suggest that it offers an effective model for our purpose. We run an ablation comparing GMM Clustering with summarizing contiguous chunks and provide details.
+Querying:
+ 
+Tree traverse method: method first selects the top-k most relevant root nodes based on their cosine similarity to the query embedding. The children of these selected nodes are considered at the next layer and the top-k nodes are selected from this pool again based on their cosine similarity to the query vector. This process is repeated until we reach the leaf nodes. Finally, the text from all selected nodes is concatenated to form the retrieved context.
+1.	Start at the root layer of the RAPTOR tree. Compute the cosine similarity between the
+            query embedding and the embeddings of all nodes present at this initial layer.
+2.	Choose the top-k nodes based on the highest cosine similarity scores, forming the set S1.
+3.	Proceed to the child nodes of the elements in set S1. Compute the cosine similarity between
+ the query vector and the vector embeddings of these child nodes.
+4.	Select the top k child nodes with the highest cosine similarity scores to the query, forming
+the set S2.
+5.	Continue this process recursively for d layers, producing sets S1, S2, . . . , Sd.
+6.	Concatenate sets S1 through Sd to assemble the relevant context to the query.
+
+Collapse tree methos: It searches relevant information by considering all nodes in the tree simultaneously,
+1.	First, collapse the entire RAPTOR tree into a single layer. This new set of nodes, denoted
+as C, contains nodes from every layer of the original tree.
+
+2.	Next, calculate the cosine similarity between the query embedding and the embeddings of
+all nodes present in the collapsed set C.
+3.	3. Finally, pick the top-k nodes that have the highest cosine similarity scores with the query.
+Keep adding nodes to the result set until you reach a predefined maximum number
+
+
+
+
+ 
+
+Figure 3: Comparison of querying methods.
+Results on 20 stories from the QASPER dataset
+using tree traversal with different top-k values,
+and collapsed tree with different context lengths.
+Collapsed tree with 2000 tokens produces the best
+results, so we use this querying strategy for main results.
+
+
+
+CONCLUSION
+RAPTOR is a novel tree-based retrieval system that augments the parametric knowledge of large language models with contextual information at various levels of abstraction. By employing recursive clustering and summarization techniques, RAPTOR creates a hierarchical tree structure that is capable of synthesizing information across various sections of the retrieval corpora. During the query phase, RAPTOR leverages this tree structure for more effective retrieval. RAPTOR not only outperforms traditional retrieval methods but also sets new performance benchmarks on several question-answering tasks.
@@ -0,0 +1 @@
+
@@ -10,5 +10,6 @@ RUN apt-get update \
     && export LD_LIBRARY_PATH=/usr/lib/x86_64-linux-gnu:$LD_LIBRARY_PATH \
     && pip install --no-cache-dir --upgrade -r /code/requirements.txt
 
-CMD ["uvicorn", "score:app", "--host", "0.0.0.0", "--port", "8000"]
+# CMD ["uvicorn", "score:app", "--host", "0.0.0.0", "--port", "8000","--workers", "4"]
+CMD ["gunicorn", "score:app","--workers","4","--worker-class","uvicorn.workers.UvicornWorker", "--bind", "0.0.0.0:8000", "--timeout", "300"]
 
@@ -19,4 +19,6 @@ NUMBER_OF_CHUNKS_TO_COMBINE = ""
 GEMINI_ENABLED = True|False
 # Enable Google Cloud logs (default is True)
 GCP_LOG_METRICS_ENABLED = True|False
-NEO4J_USER_AGENT = ""
+UPDATE_GRAPH_CHUNKS_PROCESSED = 20
+NEO4J_USER_AGENT = ""
+UPDATE_GRAPH_CHUNKS_PROCESSED = 20
@@ -38,6 +38,7 @@ frozenlist==1.4.1
 fsspec==2024.2.0
 google-api-core==2.18.0
 google-auth==2.29.0
+google_auth_oauthlib
 google-cloud-aiplatform
 google-cloud-bigquery==3.19.0
 google-cloud-core==2.4.1
@@ -87,7 +88,7 @@ matplotlib==3.7.2
 mpmath==1.3.0
 multidict==6.0.5
 mypy-extensions==1.0.0
-neo4j==5.18.0
+neo4j==5.20.0
 networkx==3.2.1
 nltk==3.8.1
 numpy==1.26.4
@@ -139,6 +140,7 @@ sniffio==1.3.1
 soupsieve==2.5
 SQLAlchemy==2.0.28
 starlette==0.36.3
+starlette-session
 sympy==1.12
 tabulate==0.9.0
 tenacity==8.2.3
@@ -158,6 +160,7 @@ unstructured-inference
 unstructured.pytesseract
 urllib3
 uvicorn
+gunicorn
 wikipedia==1.4.0
 wrapt==1.16.0
 yarl==1.9.4