diff --git a/.gitignore b/.gitignore index 3eab115..64e3a8d 100644 --- a/.gitignore +++ b/.gitignore @@ -3,6 +3,8 @@ tmp*/ *-tmp/ rdf_output/ output/ +.kbp/ +fuseki-data/ # Claude Flow generated files .claude/settings.local.json @@ -28,4 +30,4 @@ claude-flow claude-flow.bat claude-flow.ps1 hive-mind-prompt-*.txt -.kbp/ +.claude-flow/metrics diff --git a/CLAUDE.md b/CLAUDE.md index 5101176..6b28291 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -1,5 +1,30 @@ # Claude Code Configuration for Claude Flow +## Vocabulary Reference + +The project uses an RDF vocabulary from https://github.com/dstengle/knowledgebase-vocabulary/ + +### Key Information for LLM Agents: + +1. **Always import the vocabulary namespace from the centralized configuration:** + ```python + from knowledgebase_processor.config.vocabulary import KB + ``` + +2. **Never hardcode the namespace URI.** The namespace is managed centrally in `/vocabulary/VERSION.json` + +3. **The vocabulary is stored locally at `/vocabulary/kb.ttl` for deterministic builds** + +4. **Documentation is available at:** + - `/vocabulary/README.md` - Vocabulary usage and update instructions + - `/docs/development/vocabulary-usage-guide.md` - Developer guide for LLM agents + - `/docs/architecture/decisions/0014-vocabulary-reference-strategy.md` - Architecture decision + +5. **To update the vocabulary from the source repository:** + ```bash + ./scripts/sync-vocabulary.sh sync + ``` + ## 🚨 CRITICAL: PARALLEL EXECUTION AFTER SWARM INIT **MANDATORY RULE**: Once swarm is initialized with memory, ALL subsequent operations MUST be parallel: @@ -904,3 +929,9 @@ Claude Flow extends the base coordination with: --- Remember: **Claude Flow coordinates, Claude Code creates!** Start with `mcp__claude-flow__swarm_init` to enhance your development workflow. + +# important-instruction-reminders +Do what has been asked; nothing more, nothing less. +NEVER create files unless they're absolutely necessary for achieving your goal. +ALWAYS prefer editing an existing file to creating a new one. +NEVER proactively create documentation files (*.md) or README files. Only create documentation files if explicitly requested by the User. \ No newline at end of file diff --git a/docs/architecture/decisions/0014-vocabulary-reference-strategy.md b/docs/architecture/decisions/0014-vocabulary-reference-strategy.md new file mode 100644 index 0000000..4e16bab --- /dev/null +++ b/docs/architecture/decisions/0014-vocabulary-reference-strategy.md @@ -0,0 +1,153 @@ +# ADR-0014: Vocabulary Reference Strategy + +**Date:** 2025-08-13 + +**Status:** Proposed + +## Context + +The knowledgebase-processor requires a stable reference to the KB vocabulary defined in the external repository at https://github.com/dstengle/knowledgebase-vocabulary/. This vocabulary defines the RDF ontology used for knowledge graph representation. + +Currently, the project: +- Has a temporary copy at `/tmp-vocab/kb.ttl` +- Uses hardcoded namespace `http://example.org/kb/` in the code +- Needs to ensure all processing uses the correct vocabulary +- Must make the vocabulary reference easy for LLM agent coders to understand and use + +## Decision + +We will implement a **hybrid approach** combining local caching with remote reference documentation: + +### 1. Local Vocabulary Cache +- Maintain a local copy of the vocabulary at `/vocabulary/kb.ttl` +- Track this file in version control for deterministic builds +- Include version metadata in the file header + +### 2. Source Reference Documentation +- Create `/vocabulary/README.md` documenting: + - Source repository URL + - Last sync date and commit hash + - Update instructions + - Namespace URI to use + +### 3. Configuration-Based Namespace +- Define vocabulary namespace in configuration file +- Allow override via environment variable +- Default to the canonical namespace from the vocabulary + +### 4. Sync Mechanism +- Provide a script `/scripts/sync-vocabulary.sh` to update from source +- Document the sync process for maintainers +- Include validation to ensure vocabulary compatibility + +## Implementation Plan + +### Directory Structure +``` +vocabulary/ +├── kb.ttl # Local copy of vocabulary +├── README.md # Documentation and source reference +├── VERSION.json # Version metadata +└── .gitignore # (empty - track all files) +``` + +### Version Metadata Format +```json +{ + "source_repository": "https://github.com/dstengle/knowledgebase-vocabulary", + "source_commit": "sha-hash", + "sync_date": "2025-08-13T14:00:00Z", + "namespace": "http://example.org/kb/vocab#", + "version": "0.1.0-dev" +} +``` + +### Configuration Integration +```python +# src/knowledgebase_processor/config/vocabulary.py +import json +from pathlib import Path +from rdflib import Namespace + +def get_kb_namespace(): + """Get the KB namespace from vocabulary metadata.""" + vocab_dir = Path(__file__).parent.parent.parent.parent / "vocabulary" + version_file = vocab_dir / "VERSION.json" + + if version_file.exists(): + with open(version_file) as f: + metadata = json.load(f) + return Namespace(metadata["namespace"]) + + # Fallback to default + return Namespace("http://example.org/kb/vocab#") + +KB = get_kb_namespace() +``` + +## Rationale + +This approach provides: + +### For Development +- **Deterministic builds**: Local vocabulary ensures consistent behavior +- **Version control**: Track vocabulary changes with code changes +- **Offline development**: No runtime dependency on external repository + +### For LLM Agents +- **Clear documentation**: README explains the vocabulary source and usage +- **Simple imports**: `from knowledgebase_processor.config.vocabulary import KB` +- **Explicit versioning**: VERSION.json shows exactly what vocabulary version is used +- **Update instructions**: Clear process for keeping vocabulary current + +### For Maintenance +- **Traceable updates**: Git history shows when vocabulary was updated +- **Validation possible**: Can add tests to ensure vocabulary compatibility +- **Manual control**: Updates are intentional, not automatic + +## Alternatives Considered + +### 1. Git Submodule +- **Pros**: Automatic tracking of source repository +- **Cons**: Complex for LLM agents, requires git submodule knowledge + +### 2. Runtime Fetching +- **Pros**: Always up-to-date +- **Cons**: Network dependency, non-deterministic, harder to debug + +### 3. Direct Copy Only +- **Pros**: Simplest approach +- **Cons**: Loses connection to source, no version tracking + +### 4. Package Dependency +- **Pros**: Standard Python approach +- **Cons**: Vocabulary repo not published as package + +## Consequences + +### Positive +- Clear provenance of vocabulary +- Deterministic builds +- Easy for LLM agents to understand +- Simple to update when needed +- Works offline + +### Negative +- Manual sync required for updates +- Potential for drift from source +- Duplicate storage of vocabulary + +### Mitigations +- Regular sync schedule (monthly or on major updates) +- CI check to warn if vocabulary is outdated +- Clear documentation of update process + +## Related Decisions + +- [ADR-0009: Knowledge Graph and RDF Store](0009-knowledge-graph-rdf-store.md) +- [ADR-0010: Entity Modeling for RDF Serialization](0010-entity-modeling-for-rdf-serialization.md) +- [ADR-0012: Entity Modeling with Wiki-Based Architecture](0012-entity-modeling-with-wiki-based-architecture.md) + +## Notes + +The vocabulary should be treated as a critical dependency. Any updates should be tested thoroughly to ensure compatibility with existing RDF data and queries. \ No newline at end of file diff --git a/docs/development/vocabulary-usage-guide.md b/docs/development/vocabulary-usage-guide.md new file mode 100644 index 0000000..72ca1ca --- /dev/null +++ b/docs/development/vocabulary-usage-guide.md @@ -0,0 +1,243 @@ +# Vocabulary Usage Guide for LLM Agent Coders + +This guide explains how to work with the KB vocabulary when developing features for the knowledgebase-processor. + +## Quick Start + +### 1. Import the Vocabulary + +Always import the KB namespace from the centralized configuration: + +```python +from knowledgebase_processor.config.vocabulary import KB +``` + +**Never hardcode the namespace URI directly in your code.** + +### 2. Using the Vocabulary + +The KB namespace works like any RDFlib Namespace object: + +```python +from rdflib import Graph, Literal, URIRef, RDF +from knowledgebase_processor.config.vocabulary import KB + +# Create RDF triples +g = Graph() +g.bind("kb", KB) # Bind prefix for serialization + +# Reference vocabulary classes +doc_uri = URIRef("http://example.org/doc/1") +g.add((doc_uri, RDF.type, KB.Document)) +g.add((doc_uri, KB.title, Literal("My Document"))) + +# Reference vocabulary properties +g.add((doc_uri, KB.hasTag, KB["python"])) +g.add((doc_uri, KB.created, Literal("2025-08-13"))) +``` + +## Common Vocabulary Elements + +### Document Types +- `KB.Document` - Base document class +- `KB.DailyNote` - Daily note documents +- `KB.Meeting` - Meeting notes +- `KB.GroupMeeting` - Group meetings +- `KB.OneOnOneMeeting` - 1-on-1 meetings +- `KB.PersonProfile` - Person profiles +- `KB.BookNote` - Book notes +- `KB.ProjectDocument` - Project documents + +### Entity Types +- `KB.Person` - Individual people +- `KB.Company` - Organizations +- `KB.Place` - Locations +- `KB.Book` - Books +- `KB.Todo` - Action items +- `KB.Tag` - Tags/categories + +### Common Properties +- `KB.title` - Document title +- `KB.created` - Creation date +- `KB.hasTag` - Link to tags +- `KB.hasSection` - Document sections +- `KB.hasAttendee` - Meeting attendees +- `KB.isCompleted` - Todo completion status +- `KB.mentionedIn` - Entity mentions +- `KB.describes` - What a document describes + +## Code Patterns + +### Creating Entities + +```python +def create_document_entity(doc_id: str, title: str) -> Graph: + """Create RDF representation of a document.""" + from knowledgebase_processor.config.vocabulary import KB + + g = Graph() + g.bind("kb", KB) + + doc_uri = URIRef(f"http://example.org/documents/{doc_id}") + g.add((doc_uri, RDF.type, KB.Document)) + g.add((doc_uri, KB.title, Literal(title))) + + return g +``` + +### Checking Entity Types + +```python +def is_meeting_document(g: Graph, uri: URIRef) -> bool: + """Check if a URI represents a meeting document.""" + from knowledgebase_processor.config.vocabulary import KB + + return (uri, RDF.type, KB.Meeting) in g or \ + (uri, RDF.type, KB.GroupMeeting) in g or \ + (uri, RDF.type, KB.OneOnOneMeeting) in g +``` + +### Working with Properties + +```python +def add_tags_to_document(g: Graph, doc_uri: URIRef, tags: List[str]): + """Add tags to a document.""" + from knowledgebase_processor.config.vocabulary import KB + + for tag in tags: + tag_uri = KB[tag.replace(" ", "_")] + g.add((doc_uri, KB.hasTag, tag_uri)) +``` + +## Best Practices + +### DO: +- ✅ Import KB from `knowledgebase_processor.config.vocabulary` +- ✅ Use vocabulary classes for RDF.type assertions +- ✅ Use vocabulary properties for relationships +- ✅ Bind the KB prefix when creating graphs +- ✅ Check the vocabulary file for available terms + +### DON'T: +- ❌ Hardcode namespace URIs like `"http://example.org/kb/"` +- ❌ Create custom properties without checking the vocabulary +- ❌ Modify the vocabulary file directly +- ❌ Import Namespace and create your own KB namespace + +## Vocabulary Structure + +The vocabulary follows these patterns: + +1. **Classes** (Types): + - Named with PascalCase: `Document`, `Person`, `TodoItem` + - Represent entity types in the knowledge base + +2. **Properties** (Relationships): + - Named with camelCase: `hasTag`, `isCompleted`, `mentionedIn` + - Connect entities or add attributes + +3. **Instances** (Individuals): + - Can be created dynamically: `KB["tag_name"]` + - Used for tags, categories, etc. + +## Finding Available Terms + +To see what's available in the vocabulary: + +1. **Check the documentation**: `/vocabulary/README.md` +2. **Read the vocabulary file**: `/vocabulary/kb.ttl` +3. **Use introspection**: + +```python +from knowledgebase_processor.config.vocabulary import get_vocabulary_file_path +from rdflib import Graph + +# Load and explore the vocabulary +g = Graph() +g.parse(get_vocabulary_file_path(), format='turtle') + +# Find all classes +for s, p, o in g.triples((None, RDF.type, OWL.Class)): + print(f"Class: {s}") + +# Find all properties +for s, p, o in g.triples((None, RDF.type, OWL.ObjectProperty)): + print(f"Property: {s}") +``` + +## Adding New Terms + +If you need a term that doesn't exist: + +1. **Check if a similar term exists** in the vocabulary +2. **Consider using standard vocabularies** (Schema.org, FOAF, Dublin Core) +3. **Propose additions** to the source repository +4. **Document temporary extensions** clearly in your code + +Example of documenting a temporary extension: + +```python +# TODO: Propose kb:reviewStatus to vocabulary +# Temporary: Using custom property until vocabulary updated +REVIEW_STATUS = URIRef("http://example.org/kb/vocab#reviewStatus") +g.add((doc_uri, REVIEW_STATUS, Literal("pending"))) +``` + +## Testing with the Vocabulary + +```python +def test_document_creation(): + """Test creating a document with vocabulary.""" + from knowledgebase_processor.config.vocabulary import KB, validate_vocabulary + + # Ensure vocabulary is available + assert validate_vocabulary(), "Vocabulary not properly configured" + + # Test document creation + g = Graph() + g.bind("kb", KB) + + doc = URIRef("test:doc1") + g.add((doc, RDF.type, KB.Document)) + + # Verify the triple was added + assert (doc, RDF.type, KB.Document) in g +``` + +## Environment Variables + +For testing or special deployments, you can override the vocabulary namespace: + +```bash +export KB_VOCABULARY_NAMESPACE="http://test.example.org/kb/" +python your_script.py +``` + +## Common Issues and Solutions + +### Issue: "KB is not defined" +**Solution**: Import from the correct module: +```python +from knowledgebase_processor.config.vocabulary import KB +``` + +### Issue: "Unknown property kb:someProperty" +**Solution**: Check if the property exists in the vocabulary: +```bash +grep "someProperty" vocabulary/kb.ttl +``` + +### Issue: "Namespace mismatch in RDF output" +**Solution**: Ensure you're binding the namespace: +```python +g.bind("kb", KB) # Always bind before serializing +``` + +## Summary + +The vocabulary is your semantic schema. It defines: +- What types of things exist (classes) +- How they relate (properties) +- What they mean (semantics) + +Always use the centralized vocabulary configuration to ensure consistency across the codebase. When in doubt, check the vocabulary file and documentation. \ No newline at end of file diff --git a/scripts/sync-vocabulary.sh b/scripts/sync-vocabulary.sh new file mode 100755 index 0000000..1d09098 --- /dev/null +++ b/scripts/sync-vocabulary.sh @@ -0,0 +1,210 @@ +#!/bin/bash + +# Vocabulary Sync Script +# Synchronizes the local vocabulary cache with the source repository + +set -e + +SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" +PROJECT_ROOT="$(dirname "$SCRIPT_DIR")" +VOCAB_DIR="$PROJECT_ROOT/vocabulary" +VERSION_FILE="$VOCAB_DIR/VERSION.json" +VOCAB_FILE="$VOCAB_DIR/kb.ttl" + +# Source repository details +SOURCE_REPO="https://github.com/dstengle/knowledgebase-vocabulary" +SOURCE_BRANCH="main" +TEMP_DIR="/tmp/kb-vocab-sync-$$" + +# Colors for output +RED='\033[0;31m' +GREEN='\033[0;32m' +YELLOW='\033[1;33m' +NC='\033[0m' # No Color + +# Functions +print_usage() { + echo "Usage: $0 {check|diff|sync|help}" + echo "" + echo "Commands:" + echo " check - Check if local vocabulary is up-to-date" + echo " diff - Show differences between local and remote vocabulary" + echo " sync - Sync vocabulary from source repository" + echo " help - Show this help message" +} + +check_dependencies() { + if ! command -v git &> /dev/null; then + echo -e "${RED}Error: git is not installed${NC}" + exit 1 + fi + + if ! command -v curl &> /dev/null; then + echo -e "${RED}Error: curl is not installed${NC}" + exit 1 + fi +} + +get_remote_commit() { + git ls-remote "$SOURCE_REPO" "refs/heads/$SOURCE_BRANCH" | cut -f1 +} + +get_local_commit() { + if [ -f "$VERSION_FILE" ]; then + python3 -c "import json; print(json.load(open('$VERSION_FILE'))['source_commit'])" 2>/dev/null || echo "unknown" + else + echo "unknown" + fi +} + +check_status() { + echo "Checking vocabulary status..." + + local_commit=$(get_local_commit) + remote_commit=$(get_remote_commit) + + echo "Local commit: $local_commit" + echo "Remote commit: $remote_commit" + + if [ "$local_commit" = "$remote_commit" ]; then + echo -e "${GREEN}✓ Vocabulary is up-to-date${NC}" + return 0 + else + echo -e "${YELLOW}⚠ Vocabulary needs updating${NC}" + return 1 + fi +} + +show_diff() { + echo "Fetching remote vocabulary for comparison..." + + # Create temp directory + mkdir -p "$TEMP_DIR" + cd "$TEMP_DIR" + + # Clone repository (shallow clone for efficiency) + git clone --depth 1 --branch "$SOURCE_BRANCH" "$SOURCE_REPO" repo 2>/dev/null + + # Find the vocabulary file in the remote repo + remote_vocab="" + if [ -f "repo/vocabulary/kb.ttl" ]; then + remote_vocab="repo/vocabulary/kb.ttl" + elif [ -f "repo/kb.ttl" ]; then + remote_vocab="repo/kb.ttl" + else + echo -e "${RED}Error: Could not find kb.ttl in remote repository${NC}" + rm -rf "$TEMP_DIR" + exit 1 + fi + + # Show diff + if [ -f "$VOCAB_FILE" ]; then + echo "Showing differences (local vs remote):" + echo "=======================================" + diff -u "$VOCAB_FILE" "$remote_vocab" || true + else + echo -e "${YELLOW}Local vocabulary file does not exist${NC}" + echo "Remote vocabulary preview:" + echo "=========================" + head -n 50 "$remote_vocab" + fi + + # Cleanup + cd - > /dev/null + rm -rf "$TEMP_DIR" +} + +sync_vocabulary() { + echo "Syncing vocabulary from source repository..." + + # Create vocabulary directory if it doesn't exist + mkdir -p "$VOCAB_DIR" + + # Create temp directory + mkdir -p "$TEMP_DIR" + cd "$TEMP_DIR" + + # Clone repository + echo "Cloning repository..." + git clone --depth 1 --branch "$SOURCE_BRANCH" "$SOURCE_REPO" repo 2>/dev/null + + # Get the current commit hash + cd repo + current_commit=$(git rev-parse HEAD) + cd .. + + # Find and copy the vocabulary file + remote_vocab="" + if [ -f "repo/vocabulary/kb.ttl" ]; then + remote_vocab="repo/vocabulary/kb.ttl" + elif [ -f "repo/kb.ttl" ]; then + remote_vocab="repo/kb.ttl" + else + echo -e "${RED}Error: Could not find kb.ttl in remote repository${NC}" + rm -rf "$TEMP_DIR" + exit 1 + fi + + # Backup existing file if it exists + if [ -f "$VOCAB_FILE" ]; then + backup_file="$VOCAB_FILE.backup.$(date +%Y%m%d_%H%M%S)" + echo "Backing up existing vocabulary to: $backup_file" + cp "$VOCAB_FILE" "$backup_file" + fi + + # Copy the new vocabulary file + echo "Copying vocabulary file..." + cp "$remote_vocab" "$VOCAB_FILE" + + # Extract namespace from the vocabulary file + namespace=$(grep -m1 "@prefix kb:" "$VOCAB_FILE" | sed -n 's/.*<\(.*\)>.*/\1/p' || echo "http://example.org/kb/vocab#") + + # Update VERSION.json + echo "Updating VERSION.json..." + cat > "$VERSION_FILE" < /dev/null + rm -rf "$TEMP_DIR" + + echo -e "${GREEN}✓ Vocabulary synchronized successfully${NC}" + echo " Source commit: $current_commit" + echo " Namespace: $namespace" + echo "" + echo "Next steps:" + echo "1. Review the changes: git diff vocabulary/" + echo "2. Run tests: pytest tests/vocabulary/" + echo "3. Commit changes: git add vocabulary/ && git commit -m 'chore: sync vocabulary from upstream'" +} + +# Main script +check_dependencies + +case "${1:-help}" in + check) + check_status + ;; + diff) + show_diff + ;; + sync) + sync_vocabulary + ;; + help|--help|-h) + print_usage + ;; + *) + echo -e "${RED}Error: Unknown command '$1'${NC}" + print_usage + exit 1 + ;; +esac \ No newline at end of file diff --git a/src/knowledgebase_processor/config/vocabulary.py b/src/knowledgebase_processor/config/vocabulary.py new file mode 100644 index 0000000..bee5887 --- /dev/null +++ b/src/knowledgebase_processor/config/vocabulary.py @@ -0,0 +1,114 @@ +""" +Vocabulary configuration module. + +This module provides access to the KB vocabulary namespace used throughout +the knowledgebase-processor. The vocabulary is maintained in an external +repository and cached locally for deterministic builds. + +Usage: + from knowledgebase_processor.config.vocabulary import KB + + # Use the namespace to create URIs + document_class = KB.Document + has_tag_property = KB.hasTag +""" + +import json +import os +from pathlib import Path +from typing import Optional + +from rdflib import Namespace + + +def get_vocabulary_metadata() -> dict: + """ + Load vocabulary metadata from VERSION.json. + + Returns: + Dictionary containing vocabulary metadata including namespace URI. + """ + # Navigate from this file to project root, then to vocabulary directory + vocab_dir = Path(__file__).parent.parent.parent.parent / "vocabulary" + version_file = vocab_dir / "VERSION.json" + + if version_file.exists(): + with open(version_file, 'r') as f: + return json.load(f) + + # Return default metadata if file doesn't exist + return { + "namespace": "http://example.org/kb/vocab#", + "version": "unknown", + "source_repository": "https://github.com/dstengle/knowledgebase-vocabulary" + } + + +def get_kb_namespace() -> Namespace: + """ + Get the KB namespace from vocabulary metadata. + + This function reads the namespace URI from the VERSION.json file + in the vocabulary directory. If the file doesn't exist or can't + be read, it falls back to a default namespace. + + The namespace can be overridden using the KB_VOCABULARY_NAMESPACE + environment variable for testing or special deployments. + + Returns: + rdflib.Namespace object for the KB vocabulary. + """ + # Check for environment variable override + env_namespace = os.environ.get('KB_VOCABULARY_NAMESPACE') + if env_namespace: + return Namespace(env_namespace) + + # Load from metadata + metadata = get_vocabulary_metadata() + return Namespace(metadata["namespace"]) + + +def get_vocabulary_file_path() -> Path: + """ + Get the path to the local vocabulary file. + + Returns: + Path object pointing to the kb.ttl vocabulary file. + """ + vocab_dir = Path(__file__).parent.parent.parent.parent / "vocabulary" + return vocab_dir / "kb.ttl" + + +def validate_vocabulary() -> bool: + """ + Validate that the vocabulary file exists and is readable. + + Returns: + True if vocabulary is valid and accessible, False otherwise. + """ + vocab_file = get_vocabulary_file_path() + + if not vocab_file.exists(): + return False + + try: + # Try to parse the vocabulary file + from rdflib import Graph + g = Graph() + g.parse(vocab_file, format='turtle') + return True + except Exception: + return False + + +# Primary export: the KB namespace +KB = get_kb_namespace() + +# Additional exports for vocabulary management +__all__ = [ + 'KB', + 'get_vocabulary_metadata', + 'get_kb_namespace', + 'get_vocabulary_file_path', + 'validate_vocabulary' +] \ No newline at end of file diff --git a/src/knowledgebase_processor/models/kb_entities.py b/src/knowledgebase_processor/models/kb_entities.py index e79a9b6..7e4f2ca 100644 --- a/src/knowledgebase_processor/models/kb_entities.py +++ b/src/knowledgebase_processor/models/kb_entities.py @@ -2,11 +2,10 @@ from typing import Optional, Tuple, List from pydantic import BaseModel, Field -from rdflib.namespace import SDO as SCHEMA, RDFS, XSD, Namespace # Changed SCHEMA to SDO as SCHEMA -from rdflib import URIRef +from rdflib.namespace import SDO as SCHEMA, RDFS, XSD -# Define custom namespace -KB = Namespace("http://example.org/kb/") +# Import KB namespace from centralized configuration +from knowledgebase_processor.config.vocabulary import KB class KbBaseEntity(BaseModel): @@ -325,4 +324,53 @@ class Config: json_schema_extra = { "rdf_types": [KB.WikiLink], "rdfs_label_fallback_fields": ["alias", "target_path"], + } + + +class KbPlaceholderDocument(KbBaseEntity): + """ + Pydantic model for placeholder document entities. + + PlaceholderDocuments represent wiki links that reference non-existent documents. + They serve as forward references and can be promoted to actual KbDocument entities + when the referenced documents are created. + """ + title: str = Field( + ..., + description="The title extracted from the wiki link that references this placeholder.", + json_schema_extra={ + "rdf_property": SCHEMA.name, + "rdf_datatype": XSD.string, + }, + ) + normalized_name: str = Field( + ..., + description="The normalized name used to generate the deterministic ID.", + json_schema_extra={ + "rdf_property": KB.normalizedName, + "rdf_datatype": XSD.string, + }, + ) + referenced_by: Optional[List[str]] = Field( + default_factory=list, + description="List of document URIs that reference this placeholder.", + json_schema_extra={ + "rdf_property": KB.referencedBy, + "is_object_property": True, + "rdf_datatype": XSD.anyURI, + }, + ) + expected_path: Optional[str] = Field( + None, + description="The expected file path where this document should be created.", + json_schema_extra={ + "rdf_property": KB.expectedPath, + "rdf_datatype": XSD.string, + }, + ) + + class Config: + json_schema_extra = { + "rdf_types": [KB.PlaceholderDocument, SCHEMA.CreativeWork], + "rdfs_label_fallback_fields": ["title", "normalized_name"], } \ No newline at end of file diff --git a/src/knowledgebase_processor/rdf_converter/converter.py b/src/knowledgebase_processor/rdf_converter/converter.py index 1cdae58..eadd8c5 100644 --- a/src/knowledgebase_processor/rdf_converter/converter.py +++ b/src/knowledgebase_processor/rdf_converter/converter.py @@ -6,9 +6,7 @@ from rdflib.namespace import RDF, RDFS, XSD, SDO as SCHEMA from knowledgebase_processor.models.kb_entities import KbBaseEntity - -# Define Namespaces -KB = Namespace("http://example.org/kb/") # Ensure this matches kb_entities.py +from knowledgebase_processor.config.vocabulary import KB class RdfConverter: diff --git a/src/knowledgebase_processor/services/entity_service.py b/src/knowledgebase_processor/services/entity_service.py index ff3d43b..8028b48 100644 --- a/src/knowledgebase_processor/services/entity_service.py +++ b/src/knowledgebase_processor/services/entity_service.py @@ -1,36 +1,48 @@ """Entity service for handling entity transformation and KB ID generation.""" -import uuid from typing import Optional from urllib.parse import quote from ..models.entities import ExtractedEntity from ..models.kb_entities import KbBaseEntity, KbPerson, KbOrganization, KbLocation, KbDateEntity, KB +from ..utils.id_generator import EntityIdGenerator from ..utils.logging import get_logger class EntityService: """Handles entity transformation and KB ID generation.""" - def __init__(self): + def __init__(self, base_uri: str = "http://example.org/kb/"): """Initialize the EntityService.""" self.logger = get_logger("knowledgebase_processor.services.entity") + self.id_generator = EntityIdGenerator(base_uri) def generate_kb_id(self, entity_type_str: str, text: str) -> str: - """Generates a unique knowledge base ID (URI) for an entity. + """Generates a deterministic knowledge base ID (URI) for an entity. + + Uses the new deterministic ID generation from ADR-0013. Args: entity_type_str: The type of entity (e.g., "Person", "Organization") text: The text content of the entity Returns: - A unique URI for the entity + A deterministic URI for the entity """ - # Simple slugification: replace non-alphanumeric with underscore - slug = "".join(c if c.isalnum() else "_" for c in text.lower()) - # Trim slug to avoid overly long URIs, e.g., first 50 chars - slug = slug[:50].strip('_') - return str(KB[f"{entity_type_str}/{slug}_{uuid.uuid4().hex[:8]}"]) + if entity_type_str.lower() == "person": + return self.id_generator.generate_person_id(text) + elif entity_type_str.lower() == "organization": + return self.id_generator.generate_organization_id(text) + elif entity_type_str.lower() == "location": + return self.id_generator.generate_location_id(text) + elif entity_type_str.lower() == "project": + return self.id_generator.generate_project_id(text) + elif entity_type_str.lower() == "tag": + return self.id_generator.generate_tag_id(text) + else: + # Fallback for unknown entity types - use generic approach + normalized_name = self.id_generator._normalize_text_for_id(text) + return str(KB[f"{entity_type_str}/{normalized_name}"]) def transform_to_kb_entity(self, extracted_entity: ExtractedEntity, @@ -48,12 +60,8 @@ def transform_to_kb_entity(self, entity_label_upper = extracted_entity.label.upper() self.logger.info(f"Processing entity: {kb_id_text} of type {entity_label_upper}") - # Create a full URI for the source document - # Replace spaces with underscores and quote for URI safety. - # Ensure consistent path separators (/) before quoting. - normalized_path = source_doc_relative_path.replace("\\", "/") - safe_path_segment = quote(normalized_path.replace(" ", "_")) - full_document_uri = str(KB[f"Document/{safe_path_segment}"]) + # Create a full URI for the source document using deterministic ID generation + full_document_uri = self.id_generator.generate_document_id(source_doc_relative_path) common_args = { "label": extracted_entity.text, diff --git a/src/knowledgebase_processor/utils/id_generator.py b/src/knowledgebase_processor/utils/id_generator.py index 5c4fffd..90b7ce0 100644 --- a/src/knowledgebase_processor/utils/id_generator.py +++ b/src/knowledgebase_processor/utils/id_generator.py @@ -1,11 +1,19 @@ import hashlib import base64 import re +import unicodedata from urllib.parse import urljoin, quote class EntityIdGenerator: """ Generates deterministic, unique identifiers for knowledge base entities. + + Implements the normalization rules from ADR-0013: + 1. Unicode NFKD normalization + 2. Convert to lowercase + 3. Replace non-alphanumeric with hyphens + 4. Remove consecutive hyphens + 5. Trim hyphens from start/end """ def __init__(self, base_url: str): @@ -19,6 +27,36 @@ def __init__(self, base_url: str): base_url += '/' self.base_url = base_url + def _normalize_text_for_id(self, text: str) -> str: + """ + Normalizes text for use in deterministic IDs according to ADR-0013 rules. + + Args: + text: The text to normalize + + Returns: + Normalized text suitable for use in IDs + """ + if not text: + return "" + + # 1. Unicode NFKD normalization + normalized = unicodedata.normalize('NFKD', text) + + # 2. Convert to lowercase + normalized = normalized.lower() + + # 3. Replace non-alphanumeric with hyphens + normalized = re.sub(r'[^a-z0-9]', '-', normalized) + + # 4. Remove consecutive hyphens + normalized = re.sub(r'-+', '-', normalized) + + # 5. Trim hyphens from start/end + normalized = normalized.strip('-') + + return normalized + def _generate_deterministic_hash(self, *parts: str) -> str: """ Creates a short, URL-safe hash from a set of input strings. @@ -29,19 +67,116 @@ def _generate_deterministic_hash(self, *parts: str) -> str: # Use URL-safe base64 encoding and take the first 16 characters for a reasonable length return base64.urlsafe_b64encode(sha256_hash).decode('utf-8').rstrip('=')[:16] - def generate_document_id(self, normalized_path: str) -> str: + def generate_document_id(self, file_path: str) -> str: """ Generates a unique, deterministic URI for a document entity. + + Uses the ADR-0013 pattern: /Document/{normalized-file-path} Args: - normalized_path: The normalized, unique path of the document. + file_path: The original file path of the document. Returns: A full URI for the document entity. """ - # Sanitize the path for use in a URI - safe_path = quote(normalized_path) - return urljoin(self.base_url, f"documents/{safe_path}") + # Normalize the file path using ADR-0013 rules + normalized_path = self._normalize_text_for_id(file_path) + + # Remove file extension for the ID + if '.' in normalized_path: + normalized_path = normalized_path.rsplit('.', 1)[0] + + return urljoin(self.base_url, f"Document/{normalized_path}") + + def generate_placeholder_document_id(self, title: str) -> str: + """ + Generates a unique, deterministic URI for a placeholder document entity. + + Uses the ADR-0013 pattern: /PlaceholderDocument/{normalized-name} + + Args: + title: The title or name of the placeholder document. + + Returns: + A full URI for the placeholder document entity. + """ + normalized_name = self._normalize_text_for_id(title) + return urljoin(self.base_url, f"PlaceholderDocument/{normalized_name}") + + def generate_person_id(self, name: str) -> str: + """ + Generates a unique, deterministic URI for a person entity. + + Uses the ADR-0013 pattern: /Person/{normalized-name} + + Args: + name: The person's name. + + Returns: + A full URI for the person entity. + """ + normalized_name = self._normalize_text_for_id(name) + return urljoin(self.base_url, f"Person/{normalized_name}") + + def generate_organization_id(self, name: str) -> str: + """ + Generates a unique, deterministic URI for an organization entity. + + Uses the ADR-0013 pattern: /Organization/{normalized-name} + + Args: + name: The organization's name. + + Returns: + A full URI for the organization entity. + """ + normalized_name = self._normalize_text_for_id(name) + return urljoin(self.base_url, f"Organization/{normalized_name}") + + def generate_location_id(self, name: str) -> str: + """ + Generates a unique, deterministic URI for a location entity. + + Uses the ADR-0013 pattern: /Location/{normalized-name} + + Args: + name: The location's name. + + Returns: + A full URI for the location entity. + """ + normalized_name = self._normalize_text_for_id(name) + return urljoin(self.base_url, f"Location/{normalized_name}") + + def generate_project_id(self, name: str) -> str: + """ + Generates a unique, deterministic URI for a project entity. + + Uses the ADR-0013 pattern: /Project/{normalized-name} + + Args: + name: The project's name. + + Returns: + A full URI for the project entity. + """ + normalized_name = self._normalize_text_for_id(name) + return urljoin(self.base_url, f"Project/{normalized_name}") + + def generate_tag_id(self, name: str) -> str: + """ + Generates a unique, deterministic URI for a tag entity. + + Uses the ADR-0013 pattern: /Tag/{normalized-name} + + Args: + name: The tag's name. + + Returns: + A full URI for the tag entity. + """ + normalized_name = self._normalize_text_for_id(name) + return urljoin(self.base_url, f"Tag/{normalized_name}") def generate_wikilink_id(self, source_document_id: str, original_text: str) -> str: """ diff --git a/tests/config/test_vocabulary.py b/tests/config/test_vocabulary.py new file mode 100644 index 0000000..5b19b2f --- /dev/null +++ b/tests/config/test_vocabulary.py @@ -0,0 +1,177 @@ +""" +Tests for vocabulary configuration and integration. +""" + +import json +from pathlib import Path +import pytest +from rdflib import Graph, URIRef, Literal, RDF + +from knowledgebase_processor.config.vocabulary import ( + KB, + get_vocabulary_metadata, + get_vocabulary_file_path, + validate_vocabulary, + get_kb_namespace +) + + +class TestVocabularyConfiguration: + """Test vocabulary configuration module.""" + + def test_kb_namespace_import(self): + """Test that KB namespace can be imported.""" + assert KB is not None + assert str(KB) == "http://example.org/kb/vocab#" + + def test_vocabulary_metadata_loading(self): + """Test loading vocabulary metadata.""" + metadata = get_vocabulary_metadata() + + assert isinstance(metadata, dict) + assert "namespace" in metadata + assert "source_repository" in metadata + assert metadata["source_repository"] == "https://github.com/dstengle/knowledgebase-vocabulary" + + def test_vocabulary_file_path(self): + """Test getting vocabulary file path.""" + vocab_path = get_vocabulary_file_path() + + assert isinstance(vocab_path, Path) + assert vocab_path.name == "kb.ttl" + assert vocab_path.parent.name == "vocabulary" + + def test_vocabulary_validation(self): + """Test vocabulary validation.""" + is_valid = validate_vocabulary() + + # Should be valid after our setup + assert is_valid is True + + def test_vocabulary_file_exists(self): + """Test that vocabulary file exists.""" + vocab_path = get_vocabulary_file_path() + assert vocab_path.exists() + + def test_vocabulary_file_parseable(self): + """Test that vocabulary file can be parsed as RDF.""" + vocab_path = get_vocabulary_file_path() + + g = Graph() + # Should not raise an exception + g.parse(vocab_path, format='turtle') + + # Check that it contains some expected content + assert len(g) > 0 + + def test_kb_namespace_usage(self): + """Test using KB namespace to create URIs.""" + # Test creating class URIs + document_uri = KB.Document + assert isinstance(document_uri, URIRef) + assert str(document_uri) == "http://example.org/kb/vocab#Document" + + # Test creating property URIs + has_tag_uri = KB.hasTag + assert isinstance(has_tag_uri, URIRef) + assert str(has_tag_uri) == "http://example.org/kb/vocab#hasTag" + + # Test creating dynamic URIs + tag_uri = KB["python"] + assert isinstance(tag_uri, URIRef) + assert str(tag_uri) == "http://example.org/kb/vocab#python" + + def test_vocabulary_in_rdf_graph(self): + """Test using vocabulary in RDF graph operations.""" + g = Graph() + g.bind("kb", KB) + + # Create a document entity + doc_uri = URIRef("http://example.org/documents/test-doc") + g.add((doc_uri, RDF.type, KB.Document)) + g.add((doc_uri, KB.title, Literal("Test Document"))) + g.add((doc_uri, KB.hasTag, KB["test"])) + + # Verify triples were added + assert (doc_uri, RDF.type, KB.Document) in g + assert (doc_uri, KB.title, Literal("Test Document")) in g + assert (doc_uri, KB.hasTag, KB["test"]) in g + + # Test serialization includes namespace binding + turtle_output = g.serialize(format='turtle') + assert "@prefix kb:" in turtle_output + assert "kb:Document" in turtle_output or str(KB.Document) in turtle_output + + +class TestVocabularyIntegration: + """Test vocabulary integration with other modules.""" + + def test_kb_entities_import(self): + """Test that kb_entities module uses centralized vocabulary.""" + from knowledgebase_processor.models.kb_entities import KB as entities_KB + from knowledgebase_processor.config.vocabulary import KB as config_KB + + # Should be the same namespace + assert str(entities_KB) == str(config_KB) + + def test_rdf_converter_import(self): + """Test that rdf_converter module uses centralized vocabulary.""" + from knowledgebase_processor.rdf_converter.converter import KB as converter_KB + from knowledgebase_processor.config.vocabulary import KB as config_KB + + # Should be the same namespace + assert str(converter_KB) == str(config_KB) + + def test_vocabulary_consistency(self): + """Test that vocabulary is consistently used across modules.""" + from knowledgebase_processor.models.kb_entities import KbTodoItem + from knowledgebase_processor.rdf_converter.converter import RdfConverter + + # Create a todo item + todo = KbTodoItem( + kb_id="todo-1", + description="Test todo item", + is_completed=False + ) + + # Convert to RDF + converter = RdfConverter() + g = converter.kb_entity_to_graph(todo) + + # Check that it uses the correct namespace + assert len(g) > 0 + + # The entity should have the KB.Entity type (from base class) + todo_uri = URIRef("http://example.org/kb/todo-1") + assert any((todo_uri, RDF.type, o) for s, p, o in g if s == todo_uri) + + +class TestVersionFile: + """Test VERSION.json file structure.""" + + def test_version_file_exists(self): + """Test that VERSION.json exists.""" + version_path = Path(__file__).parent.parent.parent / "vocabulary" / "VERSION.json" + assert version_path.exists() + + def test_version_file_structure(self): + """Test VERSION.json has required fields.""" + version_path = Path(__file__).parent.parent.parent / "vocabulary" / "VERSION.json" + + with open(version_path) as f: + version_data = json.load(f) + + required_fields = [ + "source_repository", + "source_commit", + "sync_date", + "namespace", + "version" + ] + + for field in required_fields: + assert field in version_data, f"Missing required field: {field}" + + # Validate field values + assert version_data["source_repository"] == "https://github.com/dstengle/knowledgebase-vocabulary" + assert version_data["namespace"] == "http://example.org/kb/vocab#" \ No newline at end of file diff --git a/vocabulary/README.md b/vocabulary/README.md new file mode 100644 index 0000000..9bfb6dc --- /dev/null +++ b/vocabulary/README.md @@ -0,0 +1,173 @@ +# KB Vocabulary Reference + +This directory contains the RDF vocabulary/ontology used by the knowledgebase-processor. + +## Source + +The vocabulary is maintained in the external repository: +- **Repository**: https://github.com/dstengle/knowledgebase-vocabulary +- **Primary File**: `vocabulary/kb.ttl` +- **License**: MIT + +## Current Version + +See `VERSION.json` for: +- Source commit reference +- Last sync date +- Namespace URI +- Version information + +## Usage in Code + +### Import the Namespace + +```python +from knowledgebase_processor.config.vocabulary import KB + +# Use the namespace +from rdflib import Graph, URIRef + +g = Graph() +entity_uri = KB.Document # Creates URIRef for kb:Document class +property_uri = KB.hasTag # Creates URIRef for kb:hasTag property +``` + +### Direct File Access + +```python +from pathlib import Path + +vocab_file = Path("vocabulary/kb.ttl") +# Load vocabulary for validation or introspection +``` + +## Vocabulary Structure + +The KB vocabulary defines: + +### Core Classes +- `kb:Document` - Base class for markdown documents +- `kb:Person` - Individual entities +- `kb:Organization` - Companies and groups +- `kb:Meeting` - Meeting notes and events +- `kb:TodoItem` - Tasks and action items +- `kb:Section` - Document structure elements + +### Key Properties +- `kb:hasTag` - Links documents to tags +- `kb:hasAttendee` - Links meetings to participants +- `kb:isCompleted` - Status of todo items +- `kb:mentionedIn` - Entity references in documents + +### Integration with Standard Vocabularies +- FOAF for person modeling +- Schema.org for general semantics +- Dublin Core for metadata +- SKOS for tag hierarchies + +## Updating the Vocabulary + +### Manual Update Process + +1. Check for updates in the source repository: + ```bash + ./scripts/sync-vocabulary.sh check + ``` + +2. Review changes before syncing: + ```bash + ./scripts/sync-vocabulary.sh diff + ``` + +3. Sync from source repository: + ```bash + ./scripts/sync-vocabulary.sh sync + ``` + +4. Update VERSION.json with new commit hash and date + +5. Test compatibility: + ```bash + pytest tests/vocabulary/ + ``` + +6. Commit changes: + ```bash + git add vocabulary/ + git commit -m "chore: sync vocabulary from upstream" + ``` + +### Automated Validation + +The vocabulary is validated during: +- CI/CD pipeline runs +- Pre-commit hooks (if configured) +- Test suite execution + +## For LLM Agent Coders + +When working with the vocabulary: + +1. **Always use the configured namespace**: Import `KB` from `knowledgebase_processor.config.vocabulary` +2. **Reference this documentation**: The vocabulary structure is documented here +3. **Check VERSION.json**: Ensure you're working with the expected version +4. **Don't modify kb.ttl directly**: Changes should be made in the source repository +5. **Use type hints**: The vocabulary provides semantic types for entities + +### Example: Creating RDF Triples + +```python +from rdflib import Graph, Literal, URIRef +from knowledgebase_processor.config.vocabulary import KB + +# Create a graph +g = Graph() +g.bind("kb", KB) + +# Create a document entity +doc_uri = URIRef("http://example.org/documents/my-note") +g.add((doc_uri, RDF.type, KB.Document)) +g.add((doc_uri, KB.title, Literal("My Daily Note"))) +g.add((doc_uri, KB.hasTag, KB["work"])) + +# Create a todo item +todo_uri = URIRef("http://example.org/todos/task-1") +g.add((todo_uri, RDF.type, KB.TodoItem)) +g.add((todo_uri, KB.description, Literal("Complete vocabulary integration"))) +g.add((todo_uri, KB.isCompleted, Literal(False))) +``` + +## Vocabulary Evolution + +The vocabulary is designed to evolve with the project needs: + +1. **Backward Compatibility**: Changes maintain compatibility with existing data +2. **Semantic Versioning**: Version numbers follow semver conventions +3. **Migration Support**: Tools provided for data migration when needed +4. **Documentation**: All changes documented in the source repository + +## Troubleshooting + +### Namespace Mismatch +If you see namespace errors, ensure: +- VERSION.json matches the namespace in kb.ttl +- Code imports use the configured namespace +- RDF data uses consistent namespace URIs + +### Missing Classes/Properties +If a class or property is missing: +1. Check if it exists in the source repository +2. Verify your local copy is up-to-date +3. Consider if it needs to be added to the vocabulary + +### Import Errors +If vocabulary imports fail: +1. Ensure the vocabulary directory exists +2. Check that VERSION.json is valid JSON +3. Verify Python path includes the project root + +## Related Documentation + +- [ADR-0014: Vocabulary Reference Strategy](../architecture/decisions/0014-vocabulary-reference-strategy.md) +- [ADR-0009: Knowledge Graph and RDF Store](../architecture/decisions/0009-knowledge-graph-rdf-store.md) +- [Entity Modeling Documentation](../architecture/decisions/0012-entity-modeling-with-wiki-based-architecture.md) \ No newline at end of file diff --git a/vocabulary/VERSION.json b/vocabulary/VERSION.json new file mode 100644 index 0000000..7b76cbe --- /dev/null +++ b/vocabulary/VERSION.json @@ -0,0 +1,8 @@ +{ + "source_repository": "https://github.com/dstengle/knowledgebase-vocabulary", + "source_commit": "main", + "sync_date": "2025-08-13T14:00:00Z", + "namespace": "http://example.org/kb/vocab#", + "version": "0.1.0-dev", + "notes": "Initial vocabulary sync from tmp-vocab/kb.ttl" +} \ No newline at end of file diff --git a/vocabulary/kb.ttl b/vocabulary/kb.ttl new file mode 100644 index 0000000..99ac140 --- /dev/null +++ b/vocabulary/kb.ttl @@ -0,0 +1,439 @@ +@prefix kb: . +@prefix owl: . +@prefix rdfs: . +@prefix xsd: . +@prefix doco: . +@prefix dcterms: . +@prefix schema: . +@prefix foaf: . +@prefix rel: . +@prefix prov: . +@prefix sioc: . +@prefix skos: . + +# ============================================ +# KB VOCABULARY DEFINITION +# ============================================ + +kb: a owl:Ontology ; + dcterms:title "Personal Knowledge Base Vocabulary" ; + dcterms:description "Vocabulary for markdown-based knowledge management" ; + dcterms:created "2024-11-15"^^xsd:date ; + owl:imports , + , + . + +# ============================================ +# DOCUMENT CLASSES +# ============================================ + +# Base document class +kb:Document a owl:Class ; + rdfs:subClassOf schema:Article ; + rdfs:comment "A markdown document in the knowledge base" . + +# Document type hierarchy +kb:DailyNote a owl:Class ; + rdfs:subClassOf kb:Document ; + rdfs:comment "Daily notes document (type: daily-note)" . + +kb:Meeting a owl:Class ; + rdfs:subClassOf kb:Document, schema:Event ; + rdfs:comment "Meeting notes document" . + +kb:GroupMeeting a owl:Class ; + rdfs:subClassOf kb:Meeting ; + rdfs:comment "Group meeting notes (type: group-meeting)" . + +kb:OneOnOneMeeting a owl:Class ; + rdfs:subClassOf kb:Meeting ; + rdfs:comment "1-on-1 meeting notes (type: 1on1-meeting)" . + +kb:PersonProfile a owl:Class ; + rdfs:subClassOf kb:Document ; + rdfs:comment "Person profile document (type: person)" . + +kb:BookNote a owl:Class ; + rdfs:subClassOf kb:Document ; + rdfs:comment "Book notes document (type: book)" . + +kb:PlaceNote a owl:Class ; + rdfs:subClassOf kb:Document ; + rdfs:comment "Place/location notes (type: place)" . + +kb:ProjectDocument a owl:Class ; + rdfs:subClassOf kb:Document ; + rdfs:comment "Project-related document" . + +kb:ResearchNote a owl:Class ; + rdfs:subClassOf kb:Document ; + rdfs:comment "Research documents" . + +kb:ReadingList a owl:Class ; + rdfs:subClassOf kb:Document ; + rdfs:comment "Reading list or book collection" . + +# ============================================ +# DOCUMENT STRUCTURE CLASSES +# ============================================ + +kb:Section a owl:Class ; + rdfs:subClassOf doco:Section ; + rdfs:comment "A section in a markdown document, marked by ## headers" . + +kb:WikiLink a owl:Class ; + rdfs:comment "A [[wiki-style]] link within a document" . + +kb:TodoItem a owl:Class ; + rdfs:subClassOf doco:ListItem ; + rdfs:comment "A checkbox todo item: - [ ] or - [x]" . + +kb:Property a owl:Class ; + rdfs:comment "A key-value property from front matter or inline" . + +# ============================================ +# ENTITY CLASSES +# ============================================ + +kb:Person a owl:Class ; + owl:equivalentClass foaf:Person ; + rdfs:comment "A person mentioned in the KB" . + +kb:Company a owl:Class ; + rdfs:subClassOf foaf:Organization ; + rdfs:comment "A company/organization mentioned in the KB" . + +kb:Place a owl:Class ; + rdfs:subClassOf schema:Place ; + rdfs:comment "A location mentioned in the KB" . + +kb:Book a owl:Class ; + rdfs:subClassOf schema:Book ; + rdfs:comment "A book referenced in the KB" . + +kb:ResearchPaper a owl:Class ; + rdfs:subClassOf schema:ScholarlyArticle ; + rdfs:comment "Academic paper or publication" . + +kb:Todo a owl:Class ; + rdfs:subClassOf schema:Action ; + rdfs:comment "An inferred todo/action item" . + +kb:Relationship a owl:Class ; + rdfs:comment "An inferred relationship between people" . + +kb:ProfessionalRelationship a owl:Class ; + rdfs:subClassOf kb:Relationship ; + rdfs:comment "Work-related relationship" . + +# ============================================ +# TAG TAXONOMY (using SKOS) +# ============================================ + +kb:Tag a owl:Class ; + rdfs:subClassOf skos:Concept ; + rdfs:comment "A tag or category in the knowledge base" . + +kb:TagScheme a owl:Class ; + rdfs:subClassOf skos:ConceptScheme ; + rdfs:comment "A hierarchical tag taxonomy" . + +# Tag hierarchy uses SKOS properties +kb:hasParentTag rdfs:subPropertyOf skos:broader ; + rdfs:domain kb:Tag ; + rdfs:range kb:Tag ; + rdfs:comment "Parent tag in hierarchy (e.g., bydate/2024 → bydate)" . + +kb:hasChildTag rdfs:subPropertyOf skos:narrower ; + rdfs:domain kb:Tag ; + rdfs:range kb:Tag ; + rdfs:comment "Child tag in hierarchy" . + +kb:relatedTag rdfs:subPropertyOf skos:related ; + rdfs:domain kb:Tag ; + rdfs:range kb:Tag ; + rdfs:comment "Related but not hierarchical" . + +# Tag properties +kb:tagName rdfs:subPropertyOf skos:prefLabel ; + rdfs:domain kb:Tag ; + rdfs:comment "Display name of the tag" . + +kb:autoGenerated a owl:DatatypeProperty ; + rdfs:domain kb:Tag ; + rdfs:range xsd:boolean ; + rdfs:comment "True for system-generated tags like bydate/" . + +# ============================================ +# DOCUMENT PROPERTIES +# ============================================ + +# Reuse standard properties where applicable +kb:title rdfs:subPropertyOf dcterms:title ; + rdfs:domain kb:Document ; + rdfs:comment "Document title from front matter" . + +kb:created rdfs:subPropertyOf dcterms:created ; + rdfs:domain kb:Document ; + rdfs:range xsd:dateTime ; + rdfs:comment "Creation timestamp" . + +kb:filePath a owl:DatatypeProperty ; + rdfs:subPropertyOf dcterms:identifier ; + rdfs:domain kb:Document ; + rdfs:comment "Relative file path in the knowledge base" . + +kb:documentType a owl:DatatypeProperty ; + rdfs:domain kb:Document ; + rdfs:comment "The 'type' field from front matter" . + +kb:hasSection rdfs:subPropertyOf dcterms:hasPart ; + rdfs:domain kb:Document ; + rdfs:range kb:Section ; + rdfs:comment "Links document to its sections" . + +kb:hasTag a owl:ObjectProperty ; + rdfs:subPropertyOf dcterms:subject ; + rdfs:domain kb:Document ; + rdfs:range kb:Tag ; + rdfs:comment "Tags from front matter or inline" . + +# ============================================ +# SECTION PROPERTIES +# ============================================ + +kb:heading a owl:DatatypeProperty ; + rdfs:subPropertyOf dcterms:title ; + rdfs:domain kb:Section ; + rdfs:comment "Section heading text" . + +kb:headingLevel a owl:DatatypeProperty ; + rdfs:domain kb:Section ; + rdfs:range xsd:integer ; + rdfs:comment "Markdown heading level (1-6)" . + +kb:contains a owl:ObjectProperty ; + rdfs:subPropertyOf dcterms:hasPart ; + rdfs:domain kb:Section ; + rdfs:comment "Links, todos, and other content in section" . + +# ============================================ +# LINK PROPERTIES +# ============================================ + +kb:target a owl:ObjectProperty ; + rdfs:domain kb:WikiLink ; + rdfs:comment "Entity the link points to" . + +kb:linkText a owl:DatatypeProperty ; + rdfs:domain kb:WikiLink ; + rdfs:comment "The literal [[link text]]" . + +kb:inSection a owl:ObjectProperty ; + rdfs:domain kb:WikiLink ; + rdfs:range kb:Section ; + rdfs:comment "Section containing this link" . + +kb:surroundingText a owl:DatatypeProperty ; + rdfs:domain kb:WikiLink ; + rdfs:comment "Text context around the link" . + +# ============================================ +# TODO PROPERTIES +# ============================================ + +kb:isCompleted a owl:DatatypeProperty ; + rdfs:domain kb:TodoItem ; + rdfs:range xsd:boolean ; + rdfs:comment "Whether checkbox is checked" . + +kb:rawText a owl:DatatypeProperty ; + rdfs:domain kb:TodoItem ; + rdfs:comment "Original markdown text of todo" . + +kb:due a owl:DatatypeProperty ; + rdfs:subPropertyOf dcterms:date ; + rdfs:domain kb:Todo ; + rdfs:range xsd:date ; + rdfs:comment "Due date from [[due:: date]]" . + +kb:assignedTo a owl:ObjectProperty ; + rdfs:domain kb:Todo ; + rdfs:range kb:Person ; + rdfs:comment "Person responsible for todo" . + +kb:description a owl:DatatypeProperty ; + rdfs:domain kb:Todo ; + rdfs:comment "Todo description text" . + +# ============================================ +# ENTITY PROPERTIES +# ============================================ + +kb:isPlaceholder a owl:DatatypeProperty ; + rdfs:range xsd:boolean ; + rdfs:comment "True if entity has no dedicated document" . + +kb:describedBy a owl:ObjectProperty ; + owl:inverseOf kb:describes ; + rdfs:comment "Document that describes this entity" . + +kb:describes a owl:ObjectProperty ; + rdfs:domain kb:Document ; + rdfs:comment "Entity described by this document" . + +kb:mentionedIn a owl:ObjectProperty ; + rdfs:comment "Documents where entity is mentioned" . + +kb:mentionCount a owl:DatatypeProperty ; + rdfs:range xsd:integer ; + rdfs:comment "Number of times entity is mentioned" . + +# ============================================ +# MEETING PROPERTIES +# ============================================ + +kb:hasAttendee a owl:ObjectProperty ; + rdfs:subPropertyOf schema:attendee ; + rdfs:domain kb:Meeting ; + rdfs:range kb:Person ; + rdfs:comment "Person who attended the meeting" . + +kb:hasSpeaker a owl:ObjectProperty ; + rdfs:domain kb:Meeting ; + rdfs:range kb:Person ; + rdfs:comment "Person who spoke/presented" . + +kb:relatedToCompany a owl:ObjectProperty ; + rdfs:domain kb:Meeting ; + rdfs:range kb:Company ; + rdfs:comment "Company discussed or involved" . + +kb:meetingLocation a owl:ObjectProperty ; + rdfs:subPropertyOf schema:location ; + rdfs:domain kb:Meeting ; + rdfs:range kb:Place ; + rdfs:comment "Where the meeting took place" . + +# ============================================ +# RELATIONSHIP PROPERTIES +# ============================================ + +kb:hasRelationship a owl:ObjectProperty ; + rdfs:domain kb:Person ; + rdfs:range kb:Relationship . + +kb:withPerson a owl:ObjectProperty ; + rdfs:domain kb:Relationship ; + rdfs:range kb:Person . + +kb:relationshipStrength a owl:DatatypeProperty ; + rdfs:domain kb:Relationship ; + rdfs:range xsd:float ; + rdfs:comment "0.0 to 1.0 based on frequency/recency" . + +kb:lastInteraction a owl:DatatypeProperty ; + rdfs:domain kb:Relationship ; + rdfs:range xsd:date . + +kb:isStale a owl:DatatypeProperty ; + rdfs:domain kb:Relationship ; + rdfs:range xsd:boolean ; + rdfs:comment "True if no recent interaction" . + +kb:meetingCount a owl:DatatypeProperty ; + rdfs:domain kb:Relationship ; + rdfs:range xsd:integer . + +# Simple relationship properties (derived) +kb:frequentlyMeetsWith rdfs:subPropertyOf rel:worksWith ; + rdfs:domain kb:Person ; + rdfs:range kb:Person ; + rdfs:comment "Meets at least monthly" . + +# ============================================ +# BOOK PROPERTIES (using Schema.org/Dublin Core) +# ============================================ + +kb:hasAuthor rdfs:subPropertyOf schema:author ; + rdfs:domain kb:Book ; + rdfs:range kb:Person ; + rdfs:comment "Book author (from [[book]] by [[author]] pattern)" . + +kb:isbn rdfs:subPropertyOf schema:isbn ; + rdfs:domain kb:Book ; + rdfs:comment "ISBN from front matter" . + +kb:publicationYear rdfs:subPropertyOf schema:datePublished ; + rdfs:domain kb:Book ; + rdfs:range xsd:integer ; + rdfs:comment "Year of publication" . + +kb:bookStatus a owl:DatatypeProperty ; + rdfs:domain kb:Book ; + rdfs:comment "reading, completed, to-read, abandoned" . + +kb:rating a owl:DatatypeProperty ; + rdfs:domain kb:Book ; + rdfs:range xsd:integer ; + rdfs:comment "Personal rating (1-5 stars)" . + +# ============================================ +# PLACE PROPERTIES (using Schema.org/GeoNames) +# ============================================ + +kb:locatedIn rdfs:subPropertyOf schema:containedInPlace ; + rdfs:domain kb:Place ; + rdfs:range kb:Place ; + rdfs:comment "Hierarchical location (city in country)" . + +kb:placeType a owl:DatatypeProperty ; + rdfs:domain kb:Place ; + rdfs:comment "city, country, venue, office, etc." . + +kb:visitedOn a owl:DatatypeProperty ; + rdfs:domain kb:Place ; + rdfs:range xsd:date ; + rdfs:comment "Dates when visited" . + +kb:coordinates rdfs:subPropertyOf schema:geo ; + rdfs:domain kb:Place ; + rdfs:comment "Geographic coordinates if known" . + +# ============================================ +# PROVENANCE PROPERTIES +# ============================================ + +kb:derivedFrom a owl:ObjectProperty ; + rdfs:subPropertyOf prov:wasDerivedFrom ; + rdfs:comment "Source document/link for inferred data" . + +kb:inferredAt a owl:DatatypeProperty ; + rdfs:subPropertyOf prov:generatedAtTime ; + rdfs:range xsd:dateTime ; + rdfs:comment "When this inference was made" . + +# ============================================ +# INFERENCE RULES (Conceptual) +# ============================================ + +kb:InferenceRule a owl:Class ; + rdfs:comment "Rules for deriving high-level facts" . + +kb:attendeeInference a kb:InferenceRule ; + rdfs:comment """ + IF link appears in section with heading 'Attendees' + THEN document hasAttendee target-of-link + """ . + +kb:todoOwnerInference a kb:InferenceRule ; + rdfs:comment """ + IF todo appears in 1on1-meeting document + THEN todo assignedTo attendee-of-meeting + """ . + +kb:relationshipStalenessInference a kb:InferenceRule ; + rdfs:comment """ + IF relationship lastInteraction < 6-months-ago + THEN relationship isStale true + """ .