Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 15 additions & 0 deletions content/TinyGraphRAG/example/data.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
# Introduction

## 1.1 Introduction

Following a drizzling, we take a walk on the wet street. Feeling the gentle breeze and seeing the sunset glow, we bet the weather must be nice tomorrow. Walking to a fruit stand, we pick up a green watermelon with curly root and muffled sound; while hoping the watermelon is ripe, we also expect some good aca- demic marks this semester after all the hard work on studies. We wish readers to share the same confidence in their studies, but to begin with, let us take an informal discussion on what is machine learning .

Taking a closer look at the scenario described above, we notice that it involves many experience-based predictions. For example, why would we expect beautiful weather tomorrow after observing the gentle breeze and sunset glow? We expect this beautiful weather because,from our experience,theweather on the following day is often beautiful when we experience such a scene in the present day. Also, why do we pick the watermelon with green color, curly root, and muffled sound? It is because we have eaten and enjoyed many watermelons, and those sat- isfying the above criteria are usually ripe. Similarly, our learn- ing experience tells us that hard work leads to good academic marks. We are confident in our predictions because we learned from experience and made experience-based decisions.

Mitchell ( 1997 ) provides a more formal definition: ‘‘A computer program is said to learn from experience $E$ for some class of tasks $T$ and performance measure $P$ , if its performance at tasks in $T$ , as measured by $P$ , improves with experience $E$ .’’

E.g., Hand et al. ( 2001 ).

While humans learn from experience, can computers do the same? The answer is ‘‘yes’’, and machine learning is what we need. Machine learning is the technique that improves system performance by learning from experience via computational methods. In computer systems, experience exists in the form of data, and the main task of machine learning is to develop learning algorithms that build models from data. By feeding the learning algorithm with experience data, we obtain a model that can make predictions (e.g., the watermelon is ripe) on new observations (e.g., an uncut watermelon). If we consider com- puter science as the subject of algorithms, then machine learn- ing is the subject of learning algorithms .

In this book, we use ‘‘model’’ as a general term for the out- come learned from data. In some other literature, the term ‘‘model’’may refer to the global outcome (e.g., a decision tree), while the term ‘‘pattern’’ refers to the local outcome (e.g., a single rule).
175 changes: 175 additions & 0 deletions content/TinyGraphRAG/help.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,175 @@
{
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"/home/calvin-lucas/Documents/DataWhale_Learning_Material/tiny-graphrag\n"
]
}
],
"source": [
"# 注意:重新运行前需要:重启整个内核\n",
"import os\n",
"import sys\n",
"sys.path.append('.') # 添加当前目录到 Python 路径\n",
"print(os.getcwd()) # 验证下当前工作路径"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"# 导入模块\n",
"from tinygraph.graph import TinyGraph\n",
"from tinygraph.embedding.zhipu import zhipuEmb\n",
"from tinygraph.llm.zhipu import zhipuLLM\n",
"\n",
"from neo4j import GraphDatabase\n",
"from dotenv import load_dotenv # 用于加载环境变量"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"# 配置使用的 LLM 和 Embedding 服务,现在只支持 ZhipuAI\n",
"# 加载 .env文件, 从而导入api_key\n",
"load_dotenv() # 加载工作目录下的 .env 文件\n",
"\n",
"emb = zhipuEmb(\n",
" model_name=\"embedding-2\", # 嵌入模型\n",
" api_key=os.getenv('API_KEY')\n",
")\n",
"llm = zhipuLLM(\n",
" model_name=\"glm-3-turbo\", # LLM 模型\n",
" api_key=os.getenv('API_KEY')\n",
")\n",
"graph = TinyGraph(\n",
" url=\"neo4j://localhost:7687\",\n",
" username=\"neo4j\",\n",
" password=\"neo4j-passwordTGR\", # 初次登陆的默认密码为neo4j,此后需修改再使用\n",
" llm=llm,\n",
" emb=emb,\n",
")\n"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Document 'example/data.md' has already been loaded, skipping import process.\n"
]
}
],
"source": [
"# 使用 TinyGraph 添加文档。目前支持所有文本格式的文件。这一步的时间可能较长;\n",
"# 结束后,在当前目录下会生成一个 `workspace` 文件夹,包含 `community`、`chunk` 和 `doc` 信息\n",
"graph.add_document(\"example/data.md\")"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"数据库连接正常,节点数量: 29\n"
]
}
],
"source": [
"# 再次验证数据库连接\n",
"with graph.driver.session() as session:\n",
" result = session.run(\"MATCH (n) RETURN count(n) as count\")\n",
" count = result.single()[\"count\"]\n",
" print(f\"数据库连接正常,节点数量: {count}\")"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"本地查询结果:\n",
"The term \"dl\" is not explicitly defined in the provided context. However, based on the context's focus on machine learning, \"dl\" might commonly be interpreted as an abbreviation for \"deep learning,\" which is a subset of machine learning that involves neural networks with many layers (hence \"deep\"). Deep learning has become a prominent field, particularly in the realm of artificial intelligence, where it is used to recognize patterns and make predictions from large datasets.\n",
"\n",
"If \"dl\" refers to something else in the context of the user query, there would be no information to discern its meaning without further clarification or additional context.\n"
]
}
],
"source": [
"# 执行局部查询测试\n",
"local_res = graph.local_query(\"what is dl?\")\n",
"print(\"\\n本地查询结果:\")\n",
"print(local_res)\n"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"全局查询结果:\n",
"The term 'dl' is not explicitly mentioned in the provided data tables. Therefore, I don't know what 'dl' refers to in the context of the user's question. If 'dl' stands for 'Deep Learning,' it is a subset of machine learning that uses neural networks with many layers for feature extraction and modeling. However, this context is not provided in the data tables.\n"
]
}
],
"source": [
"\n",
"# 执行全局查询测试\n",
"global_res = graph.global_query(\"what is dl?\")\n",
"print(\"\\n全局查询结果:\")\n",
"print(global_res)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "TinyGraphRAG_2025-04-08",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.16"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading