Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 6 additions & 5 deletions AICONTEXT.md → AICONTEXT-PYLIB.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
Additional context on this repository for AI tools & coding agents
Additional context on this project for AI tools & coding agents

- Python 3.12+ code, unless otherwise specified
- Python code uses single outer quotes, including triple single quotes for e.g. docstrings
Expand All @@ -10,12 +10,13 @@ Additional context on this repository for AI tools & coding agents
- Try to stick to 120 characters per line
- if one of those comments would break this guideline, just put that comment above the line instead, as is standard convention
- If there is a pyproject.toml in place, use it as a reference for builds, installs, etc. The basic packaging and dev preference, including if you have to supply your own pyproject.toml, is as follows:
- Use pyproject.toml with hatchling, not e.g. setup.py
- Prefer hatchling build system over setuptools, poetry, etc. Avoid setuptools as much as possible. No setup.py.
- Reusable Python code modules are developed in the `pylib` folder, and installed using e.g. `uv pip install -U .`, which includes proper mapping to Python library package namespace via `tool.hatch.build.sources`. The `__init__.py` and other modules in the top-level package go directly in `pylib`, though submodules can use subdirectories, e.g. `pylib/a/b` becomes `installed_library_name.a.b`. Ultimately this will mean the installed package is importable as `from installed_library_name.etc import …`
- Yes this means editable and "dev mode" environments are NOT desirable, nor are shenanigans adding pylib to `sys.path`. Layer-efficient dockerization is an option if that's needed.
- The ethos is to always develop keeping things properly installable. No dev mode shortcuts
- Prefer hatchling build system over setuptools, poetry, etc. Avoid setuptools as much as possible. Use `[tool.hatch.build.sources]` to map source directories to package namespaces (e.g., `"pylib" = "installed_library_name"`).
- Use `[tool.hatch.build.targets.wheel]` with `only-include = ["pylib"]` to ensure the pylib directory structure gets included properly in the wheel, avoiding the duplication issue that can occur with sources mapping
- Yes this means editable and "dev mode" environments are NOT desirable, nor are shenanigans adding pylib to `sys.path`. Layer-efficient dockerization is an option if that's needed.
- The ethos is to always develop keeping things properly installable. No dev mode shortcuts. Substantive modification to libray code requires e.g. `uv pip install -U .` each time.
- Note: This avoidance of editable installs can be relaxed for non-library code, such as demos or main app launch scripts (e.g. webapp back ends)
- If it's a CLI provided as part of a library, though, it should still use proper installation via `[project.scripts]` entry points (e.g., `ooriscout = 'ooriscout.cli.scout:main'`), which creates console scripts that work correctly after `uv pip install -U .`. The CLI module lives in `pylib/cli/` and exposes a `main()` function that uses fire to handle command-line arguments.
- **Debugging package issues**: When modules aren't importing correctly after installation, check:
- That you are in the correct virtualenv (you may have to ask the developer)
- Package structure in site-packages (e.g., `ls -la /path/to/site-packages/package_name/`)
Expand Down
24 changes: 22 additions & 2 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,30 @@

Notable changes to Format based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/). Project follows [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

<!--
## [Unreleased]

-->
### Added
- **Onya Knowledge Graph Support**: New `OnyaKB` backend for loading and searching `.onya` files from directories
- File-based knowledge graph storage without database overhead
- Compatible with `KBBackend` protocol for unified KB system integration
- Text-based search across node properties
- Type-based filtering using `search_by_type()`
- Direct node retrieval by IRI with `get_node()`
- **Graph Retrieval Strategies**: New retrieval strategies in `ogbujipt.retrieval.graph`
- `TypeSearch`: Filter nodes by type (e.g., all Person entities)
- `PropertySearch`: Match nodes by property values with multiple match types (contains, equals, startswith, endswith)
- **Demo**: Complete demonstration in `demo/kgraph/simple_onya_demo.py` showing all Onya KG features
- **Tests**: Comprehensive test suite for Onya KG functionality in `test/store/kgraph/` and `test/retrieval/test_graph.py`

## [0.10.0] - 20251129

Major reorientation (or "pivot" as the cool kids say) for the project. OgbujiPT is now a general-purpose knowledge bank system for LLM-based applications. It provides a unified API for storing, retrieving, and managing semantic knowledge across multiple backends, with support for dense vector search, sparse retrieval, hybrid search, and more.

As always, we build with Pythonic simplicity and transparency in mind; avoiding the over-frameworks that plague the LLM ecosystem.

Please see the discussion, as this is ongoing work: https://github.com/OoriData/OgbujiPT/discussions/92

Not listing granular changes for this change set, as it is a foundational one; a full reset, just about.

## [0.9.4] - 20241119

Expand Down
218 changes: 218 additions & 0 deletions demo/kgraph/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,218 @@
Onya Knowledge Graph Demos. [Onya](https://github.com/OoriData/Onya) is a knowledge graph format and implementation that uses human-readable `.onya` files to represent structured knowledge. The name comes from Igbo "ọ́nyà" meaning web/network.

# OnyaKB Features

OgbujiPT's `OnyaKB` backend provides:

- **File-based knowledge graphs**: Load `.onya` files from a directory
- **In-memory storage**: No database required for static knowledge bases
- **Multiple search strategies**: Text search, type-based filtering, property matching
- **KBBackend protocol**: Compatible with OgbujiPT's unified KB system
- **Human-editable**: Edit `.onya` files directly and reload

# Demos

## 1. `simple_onya_demo.py`

Basic demonstration covering:
- Loading `.onya` files from a directory
- Text-based search across node properties
- Type-based filtering (e.g., find all Person nodes)
- Property-based search (e.g., find nodes with specific values)
- Individual node retrieval by IRI

Run:
```bash
python demo/kgraph/simple_onya_demo.py
```

# Onya File Format

Basic `.onya` format example:

```onya
# @docheader
* @document: http://example.org/mydata
* @nodebase: http://example.org/entities/
* @schema: https://schema.org/
* @language: en

# Alice [Person]
* name: Alice Smith
* age: 30
* bio: Software engineer who loves Python

# Bob [Person]
* name: Bob Jones
* occupation: Data Scientist
```

There is a document header which declares namespaces and base IRIs (URIs). Node definitions are marked with `# NodeID [Type]`. Node IDs are resolved against `@nodebase`, while types and property labels are resolved against `@schema`.
- **Properties**: Listed with `* property: value`
- **Types**: Entities can have one or more types

# Creating Your Own Knowledge Graph

1. **Create `.onya` files** in a directory:
```bash
mkdir my_knowledge
```

2. **Write your knowledge** in `.onya` format:
```bash
cat > my_knowledge/people.onya << 'EOF'
# @docheader
* @document: http://example.org/mykg
* @nodebase: http://example.org/
* @schema: https://schema.org/
* @language: en

# Person1 [Person]
* name: Your Name
* jobTitle: Your Role
* knowsAbout: Your Expertise
EOF
```

3. **Load and search**:
```python
from ogbujipt.store.kgraph import OnyaKB

kb = OnyaKB(folder_path='./my_knowledge')
await kb.setup()

# Search your knowledge
async for result in kb.search('expertise', limit=5):
print(result.content)
```

# Integration with Other OgbujiPT Features

## Hybrid Search with Vectors

Combine graph-based search with vector search:

```python
from ogbujipt.store.kgraph import OnyaKB
from ogbujipt.store.ram import RAMDataDB
from ogbujipt.retrieval import TypeSearch, DenseSearch, HybridSearch
from sentence_transformers import SentenceTransformer

# Load knowledge graph
kg = OnyaKB(folder_path='./knowledge')
await kg.setup()

# Create vector store
model = SentenceTransformer('all-MiniLM-L6-v2')
vector_db = RAMDataDB(embedding_model=model, collection_name='docs')
await vector_db.setup()

# Add graph content to vector store for semantic search
async for result in kg.search('', limit=0): # Get all nodes
await vector_db.insert(result.content, result.metadata)

# Hybrid search across both
hybrid = HybridSearch(
strategies=[DenseSearch(), TypeSearch(type_iri='http://schema.org/Person')],
)

async for result in hybrid.execute('machine learning expert',
backends=[kg, vector_db],
limit=5):
print(result.content, result.score)
```

## GraphRAG Applications

Use Onya KG as the knowledge layer in RAG applications:

```python
from ogbujipt.store.kgraph import OnyaKB
from ogbujipt.llm_wrapper import openai_chat_api, prompt_to_chat

# Load domain knowledge
kb = OnyaKB(folder_path='./domain_knowledge')
await kb.setup()

# Retrieve relevant knowledge
contexts = []
async for result in kb.search(user_query, limit=3):
contexts.append(result.content)

# Build RAG prompt
context_text = '\n\n'.join(contexts)
prompt = f"""Based on this knowledge:

{context_text}

Question: {user_query}"""

# Get LLM response
llm = openai_chat_api(base_url='http://localhost:8000')
response = await llm(prompt_to_chat(prompt))
print(response.first_choice_text)
```

# Use Cases

## Static Knowledge Bases
- **Ontologies**: Load domain ontologies (schema.org, FOAF, etc.)
- **Taxonomies**: Product catalogs, classification systems
- **Reference data**: Countries, currencies, standards
- **Company knowledge**: Org charts, procedures, policies

## Human-Curated Knowledge
- **Expert knowledge**: Subject matter expertise in structured form
- **Documentation**: Technical docs as knowledge graphs
- **Metadata**: Structured descriptions of assets/resources

## Embedded Applications
- **No database required**: Bundle knowledge with your application
- **Version controlled**: `.onya` files in git for change tracking
- **Reviewable**: Human-readable format for peer review
- **Composable**: Multiple `.onya` files for modular knowledge

# Architecture Notes

## Read-Only by Design

`OnyaKB` is intentionally read-only:
- `insert()` and `delete()` raise `NotImplementedError`
- Edit `.onya` files directly using your text editor
- Reload by calling `cleanup()` then `setup()` again
- This design encourages human curation and version control

## In-Memory Performance

All nodes are loaded into memory:
- **Fast**: No database queries, instant lookups
- **Scalable**: Suitable for graphs with up to ~100K nodes
- **Simple**: No external dependencies or setup
- **Predictable**: Performance independent of query complexity

## Search Strategies

Three built-in strategies:
1. **Text search** (`kb.search()`): Substring matching across properties
2. **Type search** (`TypeSearch`): Filter by entity type
3. **Property search** (`PropertySearch`): Match specific property values

For semantic search, combine with vector stores using hybrid strategies.

# Prerequisites

```bash
# Easiest to just use the "mega" package, with all demo requirements
uv pip install -U ".[mega]"
```

# References

- **Onya**: [https://github.com/OoriData/Onya](https://github.com/OoriData/Onya)
- **OgbujiPT Documentation**: [https://github.com/OoriData/OgbujiPT](https://github.com/OoriData/OgbujiPT)
- **Knowledge Graphs**: [https://en.wikipedia.org/wiki/Knowledge_graph](https://en.wikipedia.org/wiki/Knowledge_graph)
- **GraphRAG**: [https://arxiv.org/abs/2404.16130](https://arxiv.org/abs/2404.16130)

---

**Need help?** Open an issue at [OgbujiPT GitHub](https://github.com/OoriData/OgbujiPT/issues)
Loading