|
1 | | -# Knowledge Graph Builder App |
2 | | - |
3 | | -Creating knowledge graphs from unstructured data |
4 | | - |
5 | | - |
6 | | -# LLM Graph Builder |
7 | 1 |
|
| 2 | +# Knowledge Graph Builder |
8 | 3 |  |
9 | 4 |  |
10 | 5 |  |
11 | 6 |
|
12 | | -## Overview |
13 | | -This application is designed to turn Unstructured data (pdfs,docs,txt,youtube video,web pages,etc.) into a knowledge graph stored in Neo4j. It utilizes the power of Large language models (OpenAI,Gemini,etc.) to extract nodes, relationships and their properties from the text and create a structured knowledge graph using Langchain framework. |
| 7 | +Transform unstructured data (PDFs, DOCs, TXT, YouTube videos, web pages, etc.) into a structured Knowledge Graph stored in Neo4j using the power of Large Language Models (LLMs) and the LangChain framework. |
| 8 | + |
| 9 | +This application allows you to upload files from various sources (local machine, GCS, S3 bucket, or web sources), choose your preferred LLM model, and generate a Knowledge Graph. |
14 | 10 |
|
15 | | -Upload your files from local machine, GCS or S3 bucket or from web sources, choose your LLM model and generate knowledge graph. |
| 11 | +--- |
16 | 12 |
|
17 | 13 | ## Key Features |
18 | | -- **Knowledge Graph Creation**: Transform unstructured data into structured knowledge graphs using LLMs. |
19 | | -- **Providing Schema**: Provide your own custom schema or use existing schema in settings to generate graph. |
20 | | -- **View Graph**: View graph for a particular source or multiple sources at a time in Bloom. |
21 | | -- **Chat with Data**: Interact with your data in a Neo4j database through conversational queries, also retrieve metadata about the source of response to your queries.For a dedicated chat interface, access the standalone chat application at: [Chat-Only](https://dev-frontend-dcavk67s4a-uc.a.run.app/chat-only). This link provides a focused chat experience for querying your data. |
22 | 14 |
|
23 | | -## Getting started |
| 15 | +### **Knowledge Graph Creation** |
| 16 | +- Seamlessly transform unstructured data into structured Knowledge Graphs using advanced LLMs. |
| 17 | +- Extract nodes, relationships, and their properties to create structured graphs. |
24 | 18 |
|
25 | | -:warning: You will need to have a Neo4j Database V5.15 or later with [APOC installed](https://neo4j.com/docs/apoc/current/installation/) to use this Knowledge Graph Builder. |
26 | | -You can use any [Neo4j Aura database](https://neo4j.com/aura/) (including the free database) |
27 | | -If you are using Neo4j Desktop, you will not be able to use the docker-compose but will have to follow the [separate deployment of backend and frontend section](#running-backend-and-frontend-separately-dev-environment). :warning: |
| 19 | +### **Schema Support** |
| 20 | +- Use a custom schema or existing schemas configured in the settings to generate graphs. |
28 | 21 |
|
| 22 | +### **Graph Visualization** |
| 23 | +- View graphs for specific or multiple data sources simultaneously in **Neo4j Bloom**. |
29 | 24 |
|
30 | | -## Deployment |
31 | | -### Local deployment |
32 | | -#### Running through docker-compose |
33 | | -By default only OpenAI and Diffbot are enabled since Gemini requires extra GCP configurations. |
34 | | -According to enviornment we are configuring the models which is indicated by VITE_LLM_MODELS_PROD variable we can configure model based on our need. |
| 25 | +### **Chat with Data** |
| 26 | +- Interact with your data in the Neo4j database through conversational queries. |
| 27 | +- Retrieve metadata about the source of responses to your queries. |
| 28 | +- For a dedicated chat interface, use the standalone chat application with **[/chat-only](/chat-only) route.** |
35 | 29 |
|
36 | | -EX: |
37 | | -```env |
38 | | -VITE_LLM_MODELS_PROD="openai_gpt_4o,openai_gpt_4o_mini,diffbot,gemini_1.5_flash" |
| 30 | +### **LLMs Supported** |
| 31 | +1. OpenAI |
| 32 | +2. Gemini |
| 33 | +3. Diffbot |
| 34 | +4. Azure OpenAI(dev deployed version) |
| 35 | +5. Anthropic(dev deployed version) |
| 36 | +6. Fireworks(dev deployed version) |
| 37 | +7. Groq(dev deployed version) |
| 38 | +8. Amazon Bedrock(dev deployed version) |
| 39 | +9. Ollama(dev deployed version) |
| 40 | +10. Deepseek(dev deployed version) |
| 41 | +11. Other OpenAI Compatible baseurl models(dev deployed version) |
| 42 | + |
| 43 | +--- |
| 44 | + |
| 45 | +## Getting Started |
| 46 | + |
| 47 | +### **Prerequisites** |
| 48 | +- Neo4j Database **5.23 or later** with APOC installed. |
| 49 | + - **Neo4j Aura** databases (including the free tier) are supported. |
| 50 | + - If using **Neo4j Desktop**, you will need to deploy the backend and frontend separately (docker-compose is not supported). |
| 51 | + |
| 52 | +--- |
| 53 | + |
| 54 | +## Deployment Options |
| 55 | + |
| 56 | +### **Local Deployment** |
| 57 | + |
| 58 | +#### Using Docker-Compose |
| 59 | +Run the application using the default `docker-compose` configuration. |
| 60 | + |
| 61 | +1. **Supported LLM Models**: |
| 62 | + - By default, only OpenAI and Diffbot are enabled. Gemini requires additional GCP configurations. |
| 63 | + - Use the `VITE_LLM_MODELS_PROD` variable to configure the models you need. Example: |
| 64 | + ```bash |
| 65 | + VITE_LLM_MODELS_PROD="openai_gpt_4o,openai_gpt_4o_mini,diffbot,gemini_1.5_flash" |
| 66 | + ``` |
| 67 | + |
| 68 | +2. **Input Sources**: |
| 69 | + - By default, the following sources are enabled: `local`, `YouTube`, `Wikipedia`, `AWS S3`, and `web`. |
| 70 | + - To add Google Cloud Storage (GCS) integration, include `gcs` and your Google client ID: |
| 71 | + ```bash |
| 72 | + VITE_REACT_APP_SOURCES="local,youtube,wiki,s3,gcs,web" |
| 73 | + VITE_GOOGLE_CLIENT_ID="your-google-client-id" |
| 74 | + ``` |
| 75 | + |
| 76 | +#### Chat Modes |
| 77 | +Configure chat modes using the `VITE_CHAT_MODES` variable: |
| 78 | +- By default, all modes are enabled: `vector`, `graph_vector`, `graph`, `fulltext`, `graph_vector_fulltext`, `entity_vector`, and `global_vector`. |
| 79 | +- To specify specific modes, update the variable. For example: |
| 80 | + ```bash |
| 81 | + VITE_CHAT_MODES="vector,graph" |
| 82 | + ``` |
| 83 | + |
| 84 | +--- |
| 85 | + |
| 86 | +### **Running Backend and Frontend Separately** |
| 87 | + |
| 88 | +For development, you can run the backend and frontend independently. |
| 89 | + |
| 90 | +#### **Frontend Setup** |
| 91 | +1. Create the `.env` file in the `frontend` folder by copying `frontend/example.env`. |
| 92 | +2. Update environment variables as needed. |
| 93 | +3. Run: |
| 94 | + ```bash |
| 95 | + cd frontend |
| 96 | + yarn |
| 97 | + yarn run dev |
| 98 | + ``` |
| 99 | + |
| 100 | +#### **Backend Setup** |
| 101 | +1. Create the `.env` file in the `backend` folder by copying `backend/example.env`. |
| 102 | +2. Preconfigure user credentials in the `.env` file to bypass the login dialog: |
| 103 | + ```bash |
| 104 | + NEO4J_URI=<your-neo4j-uri> |
| 105 | + NEO4J_USERNAME=<your-username> |
| 106 | + NEO4J_PASSWORD=<your-password> |
| 107 | + NEO4J_DATABASE=<your-database-name> |
| 108 | + ``` |
| 109 | +3. Run: |
| 110 | + ```bash |
| 111 | + cd backend |
| 112 | + python -m venv envName |
| 113 | + source envName/bin/activate |
| 114 | + pip install -r requirements.txt |
| 115 | + uvicorn score:app --reload |
| 116 | + ``` |
| 117 | + |
| 118 | +--- |
| 119 | + |
| 120 | +### **Cloud Deployment** |
| 121 | + |
| 122 | +Deploy the application on **Google Cloud Platform** using the following commands: |
| 123 | + |
| 124 | +#### **Frontend Deployment** |
| 125 | +```bash |
| 126 | +gcloud run deploy dev-frontend \ |
| 127 | + --source . \ |
| 128 | + --region us-central1 \ |
| 129 | + --allow-unauthenticated |
39 | 130 | ``` |
40 | 131 |
|
41 | | -#### Additional configs |
42 | | - |
43 | | -By default, the input sources will be: Local files, Youtube, Wikipedia ,AWS S3 and Webpages. As this default config is applied: |
44 | | -```env |
45 | | -VITE_REACT_APP_SOURCES="local,youtube,wiki,s3,web" |
| 132 | +#### **Backend Deployment** |
| 133 | +```bash |
| 134 | +gcloud run deploy dev-backend \ |
| 135 | + --set-env-vars "OPENAI_API_KEY=<your-openai-api-key>" \ |
| 136 | + --set-env-vars "DIFFBOT_API_KEY=<your-diffbot-api-key>" \ |
| 137 | + --set-env-vars "NEO4J_URI=<your-neo4j-uri>" \ |
| 138 | + --set-env-vars "NEO4J_USERNAME=<your-username>" \ |
| 139 | + --set-env-vars "NEO4J_PASSWORD=<your-password>" \ |
| 140 | + --source . \ |
| 141 | + --region us-central1 \ |
| 142 | + --allow-unauthenticated |
46 | 143 | ``` |
47 | 144 |
|
48 | | -If however you want the Google GCS integration, add `gcs` and your Google client ID: |
49 | | -```env |
50 | | -VITE_REACT_APP_SOURCES="local,youtube,wiki,s3,gcs,web" |
51 | | -VITE_GOOGLE_CLIENT_ID="xxxx" |
| 145 | +--- |
| 146 | +## For local llms (Ollama) |
| 147 | +1. Pull the docker imgage of ollama |
| 148 | +```bash |
| 149 | +docker pull ollama/ollama |
52 | 150 | ``` |
53 | | - |
54 | | -You can of course combine all (local, youtube, wikipedia, s3 and gcs) or remove any you don't want/need. |
55 | | - |
56 | | -### Chat Modes |
57 | | - |
58 | | -By default,all of the chat modes will be available: vector, graph_vector, graph, fulltext, graph_vector_fulltext , entity_vector and global_vector. |
59 | | - |
60 | | -If none of the mode is mentioned in the chat modes variable all modes will be available: |
| 151 | +2. Run the ollama docker image |
| 152 | +```bash |
| 153 | +docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama |
| 154 | +``` |
| 155 | +3. Execute any llm model ex llama3 |
| 156 | +```bash |
| 157 | +docker exec -it ollama ollama run llama3 |
| 158 | +``` |
| 159 | +4. Configure env variable in docker compose. |
61 | 160 | ```env |
62 | | -VITE_CHAT_MODES="" |
| 161 | +LLM_MODEL_CONFIG_ollama_<model_name> |
| 162 | +#example |
| 163 | +LLM_MODEL_CONFIG_ollama_llama3=${LLM_MODEL_CONFIG_ollama_llama3-llama3, |
| 164 | +http://host.docker.internal:11434} |
63 | 165 | ``` |
64 | | - |
65 | | -If however you want to specify the only vector mode or only graph mode you can do that by specifying the mode in the env: |
| 166 | +5. Configure the backend API url |
66 | 167 | ```env |
67 | | -VITE_CHAT_MODES="vector,graph" |
68 | | -VITE_CHAT_MODES="vector,graph" |
| 168 | +VITE_BACKEND_API_URL=${VITE_BACKEND_API_URL-backendurl} |
69 | 169 | ``` |
| 170 | +6. Open the application in browser and select the ollama model for the extraction. |
| 171 | +7. Enjoy Graph Building. |
| 172 | +--- |
| 173 | + |
| 174 | +## Additional Configuration |
70 | 175 |
|
71 | | -#### Running Backend and Frontend separately (dev environment) |
72 | | -Alternatively, you can run the backend and frontend separately: |
73 | | - |
74 | | -- For the frontend: |
75 | | -1. Create the frontend/.env file by copy/pasting the frontend/example.env. |
76 | | -2. Change values as needed |
77 | | -3. |
78 | | - ```bash |
79 | | - cd frontend |
80 | | - yarn |
81 | | - yarn run dev |
82 | | - ``` |
83 | | - |
84 | | -- For the backend: |
85 | | -1. Create the backend/.env file by copy/pasting the backend/example.env. To streamline the initial setup and testing of the application, you can preconfigure user credentials directly within the backend .env file. This bypasses the login dialog and allows you to immediately connect with a predefined user. |
86 | | - - **NEO4J_URI**: |
87 | | - - **NEO4J_USERNAME**: |
88 | | - - **NEO4J_PASSWORD**: |
89 | | - - **NEO4J_DATABASE**: |
90 | | -3. Change values as needed |
91 | | -4. |
92 | | - ```bash |
93 | | - cd backend |
94 | | - python -m venv envName |
95 | | - source envName/bin/activate |
96 | | - pip install -r requirements.txt |
97 | | - uvicorn score:app --reload |
98 | | - ``` |
99 | | -### Deploy in Cloud |
100 | | -To deploy the app and packages on Google Cloud Platform, run the following command on google cloud run: |
| 176 | +### **LLM Models** |
| 177 | +Configure LLM models using the `VITE_LLM_MODELS_PROD` variable. Example: |
101 | 178 | ```bash |
102 | | -# Frontend deploy |
103 | | -gcloud run deploy dev-frontend |
104 | | -source location current directory > Frontend |
105 | | -region : 32 [us-central 1] |
106 | | -Allow unauthenticated request : Yes |
| 179 | +VITE_LLM_MODELS_PROD="openai_gpt_4o,openai_gpt_4o_mini,diffbot,gemini_1.5_flash" |
107 | 180 | ``` |
| 181 | + |
| 182 | +### **Input Sources** |
| 183 | +The default input sources are: `local`, `YouTube`, `Wikipedia`, `AWS S3`, and `web`. |
| 184 | + |
| 185 | +To enable GCS integration, include `gcs` and your Google client ID: |
108 | 186 | ```bash |
109 | | -# Backend deploy |
110 | | -gcloud run deploy --set-env-vars "OPENAI_API_KEY = " --set-env-vars "DIFFBOT_API_KEY = " --set-env-vars "NEO4J_URI = " --set-env-vars "NEO4J_PASSWORD = " --set-env-vars "NEO4J_USERNAME = " |
111 | | -source location current directory > Backend |
112 | | -region : 32 [us-central 1] |
113 | | -Allow unauthenticated request : Yes |
| 187 | +VITE_REACT_APP_SOURCES="local,youtube,wiki,s3,gcs,web" |
| 188 | +VITE_GOOGLE_CLIENT_ID="your-google-client-id" |
114 | 189 | ``` |
115 | 190 |
|
| 191 | +## Usage |
| 192 | +1. Connect to Neo4j Aura Instance which can be both AURA DS or AURA DB by passing URI and password through Backend env, fill using login dialog or drag and drop the Neo4j credentials file. |
| 193 | +2. To differntiate we have added different icons. For AURA DB we have a database icon and for AURA DS we have scientific molecule icon right under Neo4j Connection details label. |
| 194 | +3. Choose your source from a list of Unstructured sources to create graph. |
| 195 | +4. Change the LLM (if required) from drop down, which will be used to generate graph. |
| 196 | +5. Optionally, define schema(nodes and relationship labels) in entity graph extraction settings. |
| 197 | +6. Either select multiple files to 'Generate Graph' or all the files in 'New' status will be processed for graph creation. |
| 198 | +7. Have a look at the graph for individual files using 'View' in grid or select one or more files and 'Preview Graph' |
| 199 | +8. Ask questions related to the processed/completed sources to chat-bot, Also get detailed information about your answers generated by LLM. |
| 200 | + |
| 201 | +--- |
| 202 | + |
| 203 | + |
116 | 204 | ## ENV |
117 | 205 | | Env Variable Name | Mandatory/Optional | Default Value | Description | |
118 | 206 | |-------------------------|--------------------|---------------|--------------------------------------------------------------------------------------------------| |
@@ -166,60 +254,6 @@ Allow unauthenticated request : Yes |
166 | 254 | | VITE_TOKENS_PER_CHUNK | Optional | 100 | variable to configure tokens count per chunk.This gives flexibility for users who may require different chunk sizes for various tokenization tasks, especially when working with large datasets or specific language models. |
167 | 255 | | VITE_CHUNK_TO_COMBINE | Optional | 1 | variable to configure number of chunks to combine for parllel processing. |
168 | 256 |
|
169 | | -## LLMs Supported |
170 | | -1. OpenAI |
171 | | -2. Gemini |
172 | | -3. Diffbot |
173 | | -4. Azure OpenAI(dev deployed version) |
174 | | -5. Anthropic(dev deployed version) |
175 | | -6. Fireworks(dev deployed version) |
176 | | -7. Groq(dev deployed version) |
177 | | -8. Amazon Bedrock(dev deployed version) |
178 | | -9. Ollama(dev deployed version) |
179 | | -10. Deepseek(dev deployed version) |
180 | | -11. Other OpenAI compabtile baseurl models(dev deployed version) |
181 | | -
|
182 | | -## For local llms (Ollama) |
183 | | -1. Pull the docker imgage of ollama |
184 | | -```bash |
185 | | -docker pull ollama/ollama |
186 | | -``` |
187 | | -2. Run the ollama docker image |
188 | | -```bash |
189 | | -docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama |
190 | | -``` |
191 | | -3. Pull specific ollama model. |
192 | | -```bash |
193 | | -ollama pull llama3 |
194 | | -``` |
195 | | -4. Execute any llm model ex🦙3 |
196 | | -```bash |
197 | | -docker exec -it ollama ollama run llama3 |
198 | | -``` |
199 | | -5. Configure env variable in docker compose. |
200 | | -```env |
201 | | -LLM_MODEL_CONFIG_ollama_<model_name> |
202 | | -#example |
203 | | -LLM_MODEL_CONFIG_ollama_llama3=${LLM_MODEL_CONFIG_ollama_llama3-llama3, |
204 | | -http://host.docker.internal:11434} |
205 | | -``` |
206 | | -6. Configure the backend API url |
207 | | -```env |
208 | | -VITE_BACKEND_API_URL=${VITE_BACKEND_API_URL-backendurl} |
209 | | -``` |
210 | | -7. Open the application in browser and select the ollama model for the extraction. |
211 | | -8. Enjoy Graph Building. |
212 | | -
|
213 | | -
|
214 | | -## Usage |
215 | | -1. Connect to Neo4j Aura Instance which can be both AURA DS or AURA DB by passing URI and password through Backend env, fill using login dialog or drag and drop the Neo4j credentials file. |
216 | | -2. To differntiate we have added different icons. For AURA DB we have a database icon and for AURA DS we have scientific molecule icon right under Neo4j Connection details label. |
217 | | -3. Choose your source from a list of Unstructured sources to create graph. |
218 | | -4. Change the LLM (if required) from drop down, which will be used to generate graph. |
219 | | -5. Optionally, define schema(nodes and relationship labels) in entity graph extraction settings. |
220 | | -6. Either select multiple files to 'Generate Graph' or all the files in 'New' status will be processed for graph creation. |
221 | | -7. Have a look at the graph for individual files using 'View' in grid or select one or more files and 'Preview Graph' |
222 | | -8. Ask questions related to the processed/completed sources to chat-bot, Also get detailed information about your answers generated by LLM. |
223 | 257 |
|
224 | 258 | ## Links |
225 | 259 |
|
|
0 commit comments