You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The Knowledge Graph Generator (KGG) is a tool designed to extract structured information from unstructured text and visualize it as a knowledge graph. Leveraging large language models and modern visualization libraries, KGG enables users to input any text and receive an interactive graph of entities and their relationships. The project is implemented in Python using HuggingFace Transformers, PyVis, and Google Colab for an accessible and interactive experience.
The Knowledge Graph Generator (KGG) is an AI-powered application that transforms unstructured text into a structured knowledge graph. By identifying key entities and their relationships, KGG provides a visual and interactive representation of information, aiding in understanding, exploration, and further analysis. The project is built using Python, HuggingFace Transformers, and PyVis, and is designed to run seamlessly in Google Colab.
51
+
</p>
52
+
53
+
<h2id="approach">Approach & Methodology</h2>
54
+
<p>
55
+
The core approach of KGG is unique in that, instead of relying on external APIs to access large language models (LLMs), the LLM is loaded and run entirely <b>locally</b> on the user's machine. This ensures data privacy, removes dependency on internet connectivity, and avoids API usage costs. However, running a state-of-the-art LLM such as Open-Orca/Mistral-7B-OpenOrca locally presents significant hardware challenges, as these models typically require more than 15GB of GPU memory.
56
+
</p>
57
+
<p>
58
+
To overcome this, <b>quantization</b> techniques are employed. Specifically, the model is loaded using <code>BitsAndBytesConfig</code> with 4-bit quantization (nf4), drastically reducing the memory footprint and enabling efficient inference even on consumer-grade GPUs. This allows the entire pipeline—from prompt engineering to entity extraction and graph visualization—to be executed locally, making the solution both powerful and accessible.
59
+
</p>
60
+
<ol>
61
+
<li><b>Model Loading and Quantization:</b> The Open-Orca/Mistral-7B-OpenOrca model is loaded locally using HuggingFace Transformers with 4-bit quantization (nf4) for efficient inference. This is achieved using the <code>BitsAndBytesConfig</code> for memory and speed optimization, making it feasible to run the model on hardware with limited GPU resources.</li>
62
+
<li><b>Prompt Engineering:</b> A system prompt instructs the model to extract entities and relationships from the context and output them in a JSON format with fields: <code>node1</code>, <code>node2</code>, and <code>relationship</code>.</li>
63
+
<li><b>Text Processing:</b> The user-provided text is formatted into a prompt and passed to the locally running model. The model generates a response, which is parsed to extract the JSON array of relationships.</li>
64
+
<li><b>Knowledge Graph Construction:</b> The extracted entities and relationships are used to build a graph using PyVis, where nodes represent entities and edges represent relationships.</li>
65
+
<li><b>Visualization:</b> The resulting graph is rendered as interactive HTML, allowing users to explore the knowledge graph visually within the notebook or exported as a standalone HTML file.</li>
66
+
</ol>
67
+
68
+
<h2id="features">Features</h2>
69
+
<ul>
70
+
<li><b>Automated Entity and Relationship Extraction:</b> Utilizes a large language model to identify and extract entities and their relationships from arbitrary text.</li>
71
+
<li><b>Interactive Visualization:</b> Generates an interactive knowledge graph using PyVis, allowing users to explore nodes and edges dynamically.</li>
72
+
<li><b>Efficient Model Inference:</b> Employs 4-bit quantization for the language model, reducing memory usage and improving inference speed.</li>
73
+
<li><b>Google Colab Integration:</b> Designed for easy use in Google Colab, with a user-friendly interface for input and visualization.</li>
74
+
<li><b>Customizable Prompts:</b> The prompt engineering approach allows for flexible extraction of different types of relationships as needed.</li>
75
+
</ul>
76
+
77
+
<h2id="applications">Applications</h2>
78
+
<ul>
79
+
<li><b>Information Retrieval and Organization</b>
80
+
<ul>
81
+
<li><b>Data Management:</b> Organize large volumes of textual data into structured, interconnected entities. Useful for creating databases or enhancing existing ones.</li>
82
+
<li><b>Content Summarization:</b> Summarize key information from long documents or articles by extracting main entities and their relationships.</li>
83
+
</ul>
84
+
</li>
85
+
<li><b>Education</b>
86
+
<ul>
87
+
<li><b>Teaching Aid:</b> Assist educators in creating interactive teaching materials by visually representing complex subjects and their interrelations.</li>
88
+
<li><b>Student Projects:</b> Provide a tool for students to visualize and present their research or project findings.</li>
89
+
</ul>
90
+
</li>
91
+
<li><b>Knowledge Discovery</b>
92
+
<ul>
93
+
<li><b>Research:</b> Aid researchers in identifying relationships between different concepts, facilitating new insights and hypothesis generation.</li>
94
+
<li><b>Literature Reviews:</b> Summarize findings from numerous studies by mapping out key terms and their connections.</li>
The user interface is implemented using HTML and JavaScript within the Colab notebook. Users can input text into a search bar, and upon clicking the search button, the knowledge graph is generated and displayed interactively. The UI is styled for clarity and ease of use, with responsive design and dynamic feedback.
The Knowledge Graph Generator project demonstrates the power of combining large language models with interactive visualization tools to extract and represent structured knowledge from unstructured text. By automating the process of entity and relationship extraction and providing an intuitive interface, KGG makes knowledge discovery accessible and efficient for users in research, education, and industry.
172
+
</p>
173
+
174
+
<h2id="bibliography">Bibliography</h2>
175
+
<ul>
176
+
<li>Open-Orca/Mistral-7B-OpenOrca: <ahref="https://huggingface.co/Open-Orca/Mistral-7B-OpenOrca" target="_blank">HuggingFace Model Card</a></li>
0 commit comments