Skip to content

Commit eb0394b

Browse files
authored
Merge pull request #95 from OoriData/feature/kb-kg
Onya Knowledgebase support. See `KBBackend`
2 parents 6e7c80d + cccaa27 commit eb0394b

File tree

17 files changed

+1636
-14
lines changed

17 files changed

+1636
-14
lines changed

AICONTEXT.md renamed to AICONTEXT-PYLIB.md

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
Additional context on this repository for AI tools & coding agents
1+
Additional context on this project for AI tools & coding agents
22

33
- Python 3.12+ code, unless otherwise specified
44
- Python code uses single outer quotes, including triple single quotes for e.g. docstrings
@@ -10,12 +10,13 @@ Additional context on this repository for AI tools & coding agents
1010
- Try to stick to 120 characters per line
1111
- if one of those comments would break this guideline, just put that comment above the line instead, as is standard convention
1212
- If there is a pyproject.toml in place, use it as a reference for builds, installs, etc. The basic packaging and dev preference, including if you have to supply your own pyproject.toml, is as follows:
13-
- Use pyproject.toml with hatchling, not e.g. setup.py
13+
- Prefer hatchling build system over setuptools, poetry, etc. Avoid setuptools as much as possible. No setup.py.
1414
- Reusable Python code modules are developed in the `pylib` folder, and installed using e.g. `uv pip install -U .`, which includes proper mapping to Python library package namespace via `tool.hatch.build.sources`. The `__init__.py` and other modules in the top-level package go directly in `pylib`, though submodules can use subdirectories, e.g. `pylib/a/b` becomes `installed_library_name.a.b`. Ultimately this will mean the installed package is importable as `from installed_library_name.etc import …`
15-
- Yes this means editable and "dev mode" environments are NOT desirable, nor are shenanigans adding pylib to `sys.path`. Layer-efficient dockerization is an option if that's needed.
16-
- The ethos is to always develop keeping things properly installable. No dev mode shortcuts
17-
- Prefer hatchling build system over setuptools, poetry, etc. Avoid setuptools as much as possible. Use `[tool.hatch.build.sources]` to map source directories to package namespaces (e.g., `"pylib" = "installed_library_name"`).
1815
- Use `[tool.hatch.build.targets.wheel]` with `only-include = ["pylib"]` to ensure the pylib directory structure gets included properly in the wheel, avoiding the duplication issue that can occur with sources mapping
16+
- Yes this means editable and "dev mode" environments are NOT desirable, nor are shenanigans adding pylib to `sys.path`. Layer-efficient dockerization is an option if that's needed.
17+
- The ethos is to always develop keeping things properly installable. No dev mode shortcuts. Substantive modification to libray code requires e.g. `uv pip install -U .` each time.
18+
- Note: This avoidance of editable installs can be relaxed for non-library code, such as demos or main app launch scripts (e.g. webapp back ends)
19+
- If it's a CLI provided as part of a library, though, it should still use proper installation via `[project.scripts]` entry points (e.g., `ooriscout = 'ooriscout.cli.scout:main'`), which creates console scripts that work correctly after `uv pip install -U .`. The CLI module lives in `pylib/cli/` and exposes a `main()` function that uses fire to handle command-line arguments.
1920
- **Debugging package issues**: When modules aren't importing correctly after installation, check:
2021
- That you are in the correct virtualenv (you may have to ask the developer)
2122
- Package structure in site-packages (e.g., `ls -la /path/to/site-packages/package_name/`)

CHANGELOG.md

Lines changed: 22 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,10 +2,30 @@
22

33
Notable changes to Format based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/). Project follows [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
44

5-
<!--
65
## [Unreleased]
76

8-
-->
7+
### Added
8+
- **Onya Knowledge Graph Support**: New `OnyaKB` backend for loading and searching `.onya` files from directories
9+
- File-based knowledge graph storage without database overhead
10+
- Compatible with `KBBackend` protocol for unified KB system integration
11+
- Text-based search across node properties
12+
- Type-based filtering using `search_by_type()`
13+
- Direct node retrieval by IRI with `get_node()`
14+
- **Graph Retrieval Strategies**: New retrieval strategies in `ogbujipt.retrieval.graph`
15+
- `TypeSearch`: Filter nodes by type (e.g., all Person entities)
16+
- `PropertySearch`: Match nodes by property values with multiple match types (contains, equals, startswith, endswith)
17+
- **Demo**: Complete demonstration in `demo/kgraph/simple_onya_demo.py` showing all Onya KG features
18+
- **Tests**: Comprehensive test suite for Onya KG functionality in `test/store/kgraph/` and `test/retrieval/test_graph.py`
19+
20+
## [0.10.0] - 20251129
21+
22+
Major reorientation (or "pivot" as the cool kids say) for the project. OgbujiPT is now a general-purpose knowledge bank system for LLM-based applications. It provides a unified API for storing, retrieving, and managing semantic knowledge across multiple backends, with support for dense vector search, sparse retrieval, hybrid search, and more.
23+
24+
As always, we build with Pythonic simplicity and transparency in mind; avoiding the over-frameworks that plague the LLM ecosystem.
25+
26+
Please see the discussion, as this is ongoing work: https://github.com/OoriData/OgbujiPT/discussions/92
27+
28+
Not listing granular changes for this change set, as it is a foundational one; a full reset, just about.
929

1030
## [0.9.4] - 20241119
1131

demo/kgraph/README.md

Lines changed: 218 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,218 @@
1+
Onya Knowledge Graph Demos. [Onya](https://github.com/OoriData/Onya) is a knowledge graph format and implementation that uses human-readable `.onya` files to represent structured knowledge. The name comes from Igbo "ọ́nyà" meaning web/network.
2+
3+
# OnyaKB Features
4+
5+
OgbujiPT's `OnyaKB` backend provides:
6+
7+
- **File-based knowledge graphs**: Load `.onya` files from a directory
8+
- **In-memory storage**: No database required for static knowledge bases
9+
- **Multiple search strategies**: Text search, type-based filtering, property matching
10+
- **KBBackend protocol**: Compatible with OgbujiPT's unified KB system
11+
- **Human-editable**: Edit `.onya` files directly and reload
12+
13+
# Demos
14+
15+
## 1. `simple_onya_demo.py`
16+
17+
Basic demonstration covering:
18+
- Loading `.onya` files from a directory
19+
- Text-based search across node properties
20+
- Type-based filtering (e.g., find all Person nodes)
21+
- Property-based search (e.g., find nodes with specific values)
22+
- Individual node retrieval by IRI
23+
24+
Run:
25+
```bash
26+
python demo/kgraph/simple_onya_demo.py
27+
```
28+
29+
# Onya File Format
30+
31+
Basic `.onya` format example:
32+
33+
```onya
34+
# @docheader
35+
* @document: http://example.org/mydata
36+
* @nodebase: http://example.org/entities/
37+
* @schema: https://schema.org/
38+
* @language: en
39+
40+
# Alice [Person]
41+
* name: Alice Smith
42+
* age: 30
43+
* bio: Software engineer who loves Python
44+
45+
# Bob [Person]
46+
* name: Bob Jones
47+
* occupation: Data Scientist
48+
```
49+
50+
There is a document header which declares namespaces and base IRIs (URIs). Node definitions are marked with `# NodeID [Type]`. Node IDs are resolved against `@nodebase`, while types and property labels are resolved against `@schema`.
51+
- **Properties**: Listed with `* property: value`
52+
- **Types**: Entities can have one or more types
53+
54+
# Creating Your Own Knowledge Graph
55+
56+
1. **Create `.onya` files** in a directory:
57+
```bash
58+
mkdir my_knowledge
59+
```
60+
61+
2. **Write your knowledge** in `.onya` format:
62+
```bash
63+
cat > my_knowledge/people.onya << 'EOF'
64+
# @docheader
65+
* @document: http://example.org/mykg
66+
* @nodebase: http://example.org/
67+
* @schema: https://schema.org/
68+
* @language: en
69+
70+
# Person1 [Person]
71+
* name: Your Name
72+
* jobTitle: Your Role
73+
* knowsAbout: Your Expertise
74+
EOF
75+
```
76+
77+
3. **Load and search**:
78+
```python
79+
from ogbujipt.store.kgraph import OnyaKB
80+
81+
kb = OnyaKB(folder_path='./my_knowledge')
82+
await kb.setup()
83+
84+
# Search your knowledge
85+
async for result in kb.search('expertise', limit=5):
86+
print(result.content)
87+
```
88+
89+
# Integration with Other OgbujiPT Features
90+
91+
## Hybrid Search with Vectors
92+
93+
Combine graph-based search with vector search:
94+
95+
```python
96+
from ogbujipt.store.kgraph import OnyaKB
97+
from ogbujipt.store.ram import RAMDataDB
98+
from ogbujipt.retrieval import TypeSearch, DenseSearch, HybridSearch
99+
from sentence_transformers import SentenceTransformer
100+
101+
# Load knowledge graph
102+
kg = OnyaKB(folder_path='./knowledge')
103+
await kg.setup()
104+
105+
# Create vector store
106+
model = SentenceTransformer('all-MiniLM-L6-v2')
107+
vector_db = RAMDataDB(embedding_model=model, collection_name='docs')
108+
await vector_db.setup()
109+
110+
# Add graph content to vector store for semantic search
111+
async for result in kg.search('', limit=0): # Get all nodes
112+
await vector_db.insert(result.content, result.metadata)
113+
114+
# Hybrid search across both
115+
hybrid = HybridSearch(
116+
strategies=[DenseSearch(), TypeSearch(type_iri='http://schema.org/Person')],
117+
)
118+
119+
async for result in hybrid.execute('machine learning expert',
120+
backends=[kg, vector_db],
121+
limit=5):
122+
print(result.content, result.score)
123+
```
124+
125+
## GraphRAG Applications
126+
127+
Use Onya KG as the knowledge layer in RAG applications:
128+
129+
```python
130+
from ogbujipt.store.kgraph import OnyaKB
131+
from ogbujipt.llm_wrapper import openai_chat_api, prompt_to_chat
132+
133+
# Load domain knowledge
134+
kb = OnyaKB(folder_path='./domain_knowledge')
135+
await kb.setup()
136+
137+
# Retrieve relevant knowledge
138+
contexts = []
139+
async for result in kb.search(user_query, limit=3):
140+
contexts.append(result.content)
141+
142+
# Build RAG prompt
143+
context_text = '\n\n'.join(contexts)
144+
prompt = f"""Based on this knowledge:
145+
146+
{context_text}
147+
148+
Question: {user_query}"""
149+
150+
# Get LLM response
151+
llm = openai_chat_api(base_url='http://localhost:8000')
152+
response = await llm(prompt_to_chat(prompt))
153+
print(response.first_choice_text)
154+
```
155+
156+
# Use Cases
157+
158+
## Static Knowledge Bases
159+
- **Ontologies**: Load domain ontologies (schema.org, FOAF, etc.)
160+
- **Taxonomies**: Product catalogs, classification systems
161+
- **Reference data**: Countries, currencies, standards
162+
- **Company knowledge**: Org charts, procedures, policies
163+
164+
## Human-Curated Knowledge
165+
- **Expert knowledge**: Subject matter expertise in structured form
166+
- **Documentation**: Technical docs as knowledge graphs
167+
- **Metadata**: Structured descriptions of assets/resources
168+
169+
## Embedded Applications
170+
- **No database required**: Bundle knowledge with your application
171+
- **Version controlled**: `.onya` files in git for change tracking
172+
- **Reviewable**: Human-readable format for peer review
173+
- **Composable**: Multiple `.onya` files for modular knowledge
174+
175+
# Architecture Notes
176+
177+
## Read-Only by Design
178+
179+
`OnyaKB` is intentionally read-only:
180+
- `insert()` and `delete()` raise `NotImplementedError`
181+
- Edit `.onya` files directly using your text editor
182+
- Reload by calling `cleanup()` then `setup()` again
183+
- This design encourages human curation and version control
184+
185+
## In-Memory Performance
186+
187+
All nodes are loaded into memory:
188+
- **Fast**: No database queries, instant lookups
189+
- **Scalable**: Suitable for graphs with up to ~100K nodes
190+
- **Simple**: No external dependencies or setup
191+
- **Predictable**: Performance independent of query complexity
192+
193+
## Search Strategies
194+
195+
Three built-in strategies:
196+
1. **Text search** (`kb.search()`): Substring matching across properties
197+
2. **Type search** (`TypeSearch`): Filter by entity type
198+
3. **Property search** (`PropertySearch`): Match specific property values
199+
200+
For semantic search, combine with vector stores using hybrid strategies.
201+
202+
# Prerequisites
203+
204+
```bash
205+
# Easiest to just use the "mega" package, with all demo requirements
206+
uv pip install -U ".[mega]"
207+
```
208+
209+
# References
210+
211+
- **Onya**: [https://github.com/OoriData/Onya](https://github.com/OoriData/Onya)
212+
- **OgbujiPT Documentation**: [https://github.com/OoriData/OgbujiPT](https://github.com/OoriData/OgbujiPT)
213+
- **Knowledge Graphs**: [https://en.wikipedia.org/wiki/Knowledge_graph](https://en.wikipedia.org/wiki/Knowledge_graph)
214+
- **GraphRAG**: [https://arxiv.org/abs/2404.16130](https://arxiv.org/abs/2404.16130)
215+
216+
---
217+
218+
**Need help?** Open an issue at [OgbujiPT GitHub](https://github.com/OoriData/OgbujiPT/issues)

0 commit comments

Comments
 (0)