Skip to content

Commit b245e94

Browse files
dstengleclaude
andauthored
feat: Adapt entities to use knowledgebase-vocabulary (#59)
* feat: adapt entities to use knowledgebase-vocabulary - Add KbPlaceholderDocument entity for wiki-style forward references - Implement deterministic ID generation following ADR-0013 standards - Update EntityService to use deterministic IDs instead of random UUIDs - Add comprehensive ID normalization with Unicode NFKD and alphanumeric conversion - Support for Person, Organization, Location, Project, Tag, and PlaceholderDocument entities - Maintain RDF vocabulary compatibility with existing converter - Enable wiki-based document linking with predictable entity identifiers - Sync with vocabulary from dstengle/knowledgebase-vocabulary 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> * Remove claude-flow metrics files Signed-off-by: David Stenglein <[email protected]> * Updating .gitignore Signed-off-by: David Stenglein <[email protected]> * feat: implement vocabulary reference strategy with centralized configuration - Add local vocabulary cache at /vocabulary/ with kb.ttl file - Create centralized vocabulary configuration module - Add VERSION.json for tracking vocabulary metadata - Implement sync script for updating from upstream repository - Add comprehensive documentation for LLM agents - Create ADR-0014 documenting the vocabulary reference strategy - Update existing code to use centralized KB namespace import - Add tests for vocabulary configuration and integration This provides a deterministic, LLM-friendly approach to managing the external vocabulary dependency while maintaining clear provenance. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> --------- Signed-off-by: David Stenglein <[email protected]> Co-authored-by: Claude <[email protected]>
1 parent 4035c23 commit b245e94

File tree

14 files changed

+1767
-28
lines changed

14 files changed

+1767
-28
lines changed

.gitignore

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,8 @@ tmp*/
33
*-tmp/
44
rdf_output/
55
output/
6+
.kbp/
7+
fuseki-data/
68

79
# Claude Flow generated files
810
.claude/settings.local.json
@@ -28,4 +30,4 @@ claude-flow
2830
claude-flow.bat
2931
claude-flow.ps1
3032
hive-mind-prompt-*.txt
31-
.kbp/
33+
.claude-flow/metrics

CLAUDE.md

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,30 @@
11
# Claude Code Configuration for Claude Flow
22

3+
## Vocabulary Reference
4+
5+
The project uses an RDF vocabulary from https://github.com/dstengle/knowledgebase-vocabulary/
6+
7+
### Key Information for LLM Agents:
8+
9+
1. **Always import the vocabulary namespace from the centralized configuration:**
10+
```python
11+
from knowledgebase_processor.config.vocabulary import KB
12+
```
13+
14+
2. **Never hardcode the namespace URI.** The namespace is managed centrally in `/vocabulary/VERSION.json`
15+
16+
3. **The vocabulary is stored locally at `/vocabulary/kb.ttl` for deterministic builds**
17+
18+
4. **Documentation is available at:**
19+
- `/vocabulary/README.md` - Vocabulary usage and update instructions
20+
- `/docs/development/vocabulary-usage-guide.md` - Developer guide for LLM agents
21+
- `/docs/architecture/decisions/0014-vocabulary-reference-strategy.md` - Architecture decision
22+
23+
5. **To update the vocabulary from the source repository:**
24+
```bash
25+
./scripts/sync-vocabulary.sh sync
26+
```
27+
328
## 🚨 CRITICAL: PARALLEL EXECUTION AFTER SWARM INIT
429

530
**MANDATORY RULE**: Once swarm is initialized with memory, ALL subsequent operations MUST be parallel:
@@ -904,3 +929,9 @@ Claude Flow extends the base coordination with:
904929
---
905930

906931
Remember: **Claude Flow coordinates, Claude Code creates!** Start with `mcp__claude-flow__swarm_init` to enhance your development workflow.
932+
933+
# important-instruction-reminders
934+
Do what has been asked; nothing more, nothing less.
935+
NEVER create files unless they're absolutely necessary for achieving your goal.
936+
ALWAYS prefer editing an existing file to creating a new one.
937+
NEVER proactively create documentation files (*.md) or README files. Only create documentation files if explicitly requested by the User.
Lines changed: 153 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,153 @@
1+
# ADR-0014: Vocabulary Reference Strategy
2+
3+
**Date:** 2025-08-13
4+
5+
**Status:** Proposed
6+
7+
## Context
8+
9+
The knowledgebase-processor requires a stable reference to the KB vocabulary defined in the external repository at https://github.com/dstengle/knowledgebase-vocabulary/. This vocabulary defines the RDF ontology used for knowledge graph representation.
10+
11+
Currently, the project:
12+
- Has a temporary copy at `/tmp-vocab/kb.ttl`
13+
- Uses hardcoded namespace `http://example.org/kb/` in the code
14+
- Needs to ensure all processing uses the correct vocabulary
15+
- Must make the vocabulary reference easy for LLM agent coders to understand and use
16+
17+
## Decision
18+
19+
We will implement a **hybrid approach** combining local caching with remote reference documentation:
20+
21+
### 1. Local Vocabulary Cache
22+
- Maintain a local copy of the vocabulary at `/vocabulary/kb.ttl`
23+
- Track this file in version control for deterministic builds
24+
- Include version metadata in the file header
25+
26+
### 2. Source Reference Documentation
27+
- Create `/vocabulary/README.md` documenting:
28+
- Source repository URL
29+
- Last sync date and commit hash
30+
- Update instructions
31+
- Namespace URI to use
32+
33+
### 3. Configuration-Based Namespace
34+
- Define vocabulary namespace in configuration file
35+
- Allow override via environment variable
36+
- Default to the canonical namespace from the vocabulary
37+
38+
### 4. Sync Mechanism
39+
- Provide a script `/scripts/sync-vocabulary.sh` to update from source
40+
- Document the sync process for maintainers
41+
- Include validation to ensure vocabulary compatibility
42+
43+
## Implementation Plan
44+
45+
### Directory Structure
46+
```
47+
vocabulary/
48+
├── kb.ttl # Local copy of vocabulary
49+
├── README.md # Documentation and source reference
50+
├── VERSION.json # Version metadata
51+
└── .gitignore # (empty - track all files)
52+
```
53+
54+
### Version Metadata Format
55+
```json
56+
{
57+
"source_repository": "https://github.com/dstengle/knowledgebase-vocabulary",
58+
"source_commit": "sha-hash",
59+
"sync_date": "2025-08-13T14:00:00Z",
60+
"namespace": "http://example.org/kb/vocab#",
61+
"version": "0.1.0-dev"
62+
}
63+
```
64+
65+
### Configuration Integration
66+
```python
67+
# src/knowledgebase_processor/config/vocabulary.py
68+
import json
69+
from pathlib import Path
70+
from rdflib import Namespace
71+
72+
def get_kb_namespace():
73+
"""Get the KB namespace from vocabulary metadata."""
74+
vocab_dir = Path(__file__).parent.parent.parent.parent / "vocabulary"
75+
version_file = vocab_dir / "VERSION.json"
76+
77+
if version_file.exists():
78+
with open(version_file) as f:
79+
metadata = json.load(f)
80+
return Namespace(metadata["namespace"])
81+
82+
# Fallback to default
83+
return Namespace("http://example.org/kb/vocab#")
84+
85+
KB = get_kb_namespace()
86+
```
87+
88+
## Rationale
89+
90+
This approach provides:
91+
92+
### For Development
93+
- **Deterministic builds**: Local vocabulary ensures consistent behavior
94+
- **Version control**: Track vocabulary changes with code changes
95+
- **Offline development**: No runtime dependency on external repository
96+
97+
### For LLM Agents
98+
- **Clear documentation**: README explains the vocabulary source and usage
99+
- **Simple imports**: `from knowledgebase_processor.config.vocabulary import KB`
100+
- **Explicit versioning**: VERSION.json shows exactly what vocabulary version is used
101+
- **Update instructions**: Clear process for keeping vocabulary current
102+
103+
### For Maintenance
104+
- **Traceable updates**: Git history shows when vocabulary was updated
105+
- **Validation possible**: Can add tests to ensure vocabulary compatibility
106+
- **Manual control**: Updates are intentional, not automatic
107+
108+
## Alternatives Considered
109+
110+
### 1. Git Submodule
111+
- **Pros**: Automatic tracking of source repository
112+
- **Cons**: Complex for LLM agents, requires git submodule knowledge
113+
114+
### 2. Runtime Fetching
115+
- **Pros**: Always up-to-date
116+
- **Cons**: Network dependency, non-deterministic, harder to debug
117+
118+
### 3. Direct Copy Only
119+
- **Pros**: Simplest approach
120+
- **Cons**: Loses connection to source, no version tracking
121+
122+
### 4. Package Dependency
123+
- **Pros**: Standard Python approach
124+
- **Cons**: Vocabulary repo not published as package
125+
126+
## Consequences
127+
128+
### Positive
129+
- Clear provenance of vocabulary
130+
- Deterministic builds
131+
- Easy for LLM agents to understand
132+
- Simple to update when needed
133+
- Works offline
134+
135+
### Negative
136+
- Manual sync required for updates
137+
- Potential for drift from source
138+
- Duplicate storage of vocabulary
139+
140+
### Mitigations
141+
- Regular sync schedule (monthly or on major updates)
142+
- CI check to warn if vocabulary is outdated
143+
- Clear documentation of update process
144+
145+
## Related Decisions
146+
147+
- [ADR-0009: Knowledge Graph and RDF Store](0009-knowledge-graph-rdf-store.md)
148+
- [ADR-0010: Entity Modeling for RDF Serialization](0010-entity-modeling-for-rdf-serialization.md)
149+
- [ADR-0012: Entity Modeling with Wiki-Based Architecture](0012-entity-modeling-with-wiki-based-architecture.md)
150+
151+
## Notes
152+
153+
The vocabulary should be treated as a critical dependency. Any updates should be tested thoroughly to ensure compatibility with existing RDF data and queries.

0 commit comments

Comments
 (0)