dstengle · dstengle · Sep 11, 2025 · Sep 11, 2025
diff --git a/.gitignore b/.gitignore
@@ -30,4 +30,4 @@ claude-flow
 claude-flow.bat
 claude-flow.ps1
 hive-mind-prompt-*.txt
-.claude-flow/metrics
+**/.claude-flow/metrics
diff --git a/REFACTOR.md b/REFACTOR.md
@@ -0,0 +1,77 @@
+Refactoring Plan: Specification-Driven TestingThis document outlines the process for refactoring the existing test suite into a specification-driven format. The goal is to capture the current "as-is" behavior of the system in declarative artifacts, which will enable a more robust, maintainable, and agent-friendly development workflow.🎯 Phase 1: Setup and ScaffoldingThis phase lays the foundation for the new testing structure.Task 1.1: Create the Specification Directory StructureCreate the following new directories in the root of the repository:specs/
+├── README.md
+├── reference_corpus/
+└── test_cases/
+specs/README.md: Create this file and add a brief explanation of the purpose of this new testing structure.specs/reference_corpus/: Copy the contents of the existing sample_data/ directory into this new directory. This will be our integration and regression testing suite.specs/test_cases/: This directory will hold the individual unit tests, with each feature getting its own subdirectory.Task 1.2: Create the Generic Test RunnerCreate a new test file, tests/test_specifications.py. This single file will eventually replace most of the existing unit tests. It will contain a generic, data-driven test runner.# tests/test_specifications.py
+import pytest
+from pathlib import Path
+from knowledgebase_processor.processor import Processor 
+
+# You will need a way to compare two RDF graphs. 
+# The `rdflib.compare.isomorphic` function is perfect for this.
+from rdflib import Graph
+from rdflib.compare import isomorphic
+
+def run_spec_test(test_case_dir: Path):
+    """
+    Runs a single specification-driven test.
+    """
+    input_md_path = test_case_dir / "input.md"
+    expected_output_ttl_path = test_case_dir / "expected_output.ttl"
+
+    # 1. Read the input markdown file
+    input_md_content = input_md_path.read_text()
+
+    # 2. Run the processor to get the "as-is" RDF graph
+    # NOTE: You will need a method on your Processor that can take a string
+    # of markdown and return an rdflib.Graph object.
+    processor = Processor(...) # Configure your processor as needed
+    as_is_graph = processor.process_content_to_graph(input_md_content) 
+
+    # 3. Read the "to-be" (expected) RDF graph
+    expected_graph = Graph()
+    expected_graph.parse(str(expected_output_ttl_path), format="turtle")
+
+    # 4. Compare the two RDF graphs for isomorphism (i.e., they are equivalent)
+    assert isomorphic(as_is_graph, expected_graph)
+
+# This function will automatically discover all your test cases
+def get_test_cases():
+    specs_dir = Path("specs/test_cases")
+    if not specs_dir.exists():
+        return []
+    return [d for d in specs_dir.iterdir() if d.is_dir()]
+
+@pytest.mark.parametrize("test_case_dir", get_test_cases())
+def test_specifications(test_case_dir):
+    run_spec_test(test_case_dir)
+🔬 Phase 2: "As-Is" State Capture (Unit Tests)This phase is the core of the refactoring effort. You will systematically convert each existing unit test into the new declarative format.Task 2.1: Convert test_todo_item_extractor.pyFor each test function in tests/extractor/test_todo_item_extractor.py:Create a Test Case Directory: Create a new subdirectory in specs/test_cases/ that describes the test (e.g., 01_extract_incomplete_todo).Create input.md: Take the Markdown string being used in the test and save it as input.md in the new directory.Generate expected_output.ttl: Temporarily modify the test function to run the full processor on the input and serialize the resulting RDF graph to a file. Save this as expected_output.ttl in the new directory.Run the New Test: Run pytest tests/test_specifications.py. The new test case should now be discovered and pass, confirming that you have successfully captured the "as-is" state.Delete the Old Test: Once the new test is passing, delete the original Python test function.Repeat this process until test_todo_item_extractor.py is empty, then delete the file.Task 2.2: Convert Remaining Extractor TestsRepeat the process from Task 2.1 for all remaining test files in the tests/extractor/ directory.Task 2.3: Convert Other Unit TestsContinue this process for all other relevant unit test files in the tests/ directory, such as those in tests/analyzer/ and tests/parser/.🚗 Phase 3: Integration and CleanupThis phase establishes the regression test suite and cleans up the old test files.Task 3.1: Create the Reference Corpus TestGenerate "As-Is" TTLs: Write a one-off script that iterates through every .md file in your specs/reference_corpus/ directory. For each file, run the processor and save the resulting RDF graph as a corresponding .ttl file in the same directory.Create a New Integration Test: Add a new test file, tests/test_reference_corpus.py. This test will be similar to the unit test runner but will work on the entire reference corpus.# tests/test_reference_corpus.py
+import pytest
+from pathlib import Path
+from knowledgebase_processor.processor import Processor
+from rdflib import Graph
+from rdflib.compare import isomorphic
+
+def run_corpus_test(markdown_path: Path):
+    expected_ttl_path = markdown_path.with_suffix(".ttl")
+
+    input_content = markdown_path.read_text()
+
+    processor = Processor(...) # Configure processor
+    as_is_graph = processor.process_content_to_graph(input_content)
+
+    expected_graph = Graph()
+    expected_graph.parse(str(expected_ttl_path), format="turtle")
+
+    assert isomorphic(as_is_graph, expected_graph)
+
+def get_corpus_files():
+    corpus_dir = Path("specs/reference_corpus")
+    if not corpus_dir.exists():
+        return []
+    return list(corpus_dir.glob("*.md"))
+
+@pytest.mark.parametrize("markdown_path", get_corpus_files())
+def test_reference_corpus(markdown_path):
+    run_corpus_test(markdown_path)
+Task 3.2: Final CleanupReview the tests/ directory and remove any remaining test files that have been made redundant by the new specification-driven approach.By the end of this process, your tests/ directory will be much smaller, and you will have a comprehensive, version-controlled, and easily updatable specification of your entire system's behavior in the specs/ directory.
diff --git a/sample_data/DORA Community Discussion-2024-11-07.md b/sample_data/DORA Community Discussion-2024-11-07.md
diff --git a/specs/README.md b/specs/README.md
@@ -0,0 +1,32 @@
+# Specification-Driven Testing
+
+This directory contains the specification-driven testing structure for the knowledgebase-processor project. This approach captures the current "as-is" behavior of the system in declarative artifacts, enabling a more robust, maintainable, and agent-friendly development workflow.
+
+## Directory Structure
+
+### `reference_corpus/`
+Contains the integration and regression testing suite. These are real-world markdown files that represent the expected inputs the system should handle. Each `.md` file has a corresponding `.ttl` file that represents the expected RDF output.
+
+### `test_cases/`
+Contains individual unit test specifications. Each subdirectory represents a specific test case with:
+- `input.md` - The markdown input for the test
+- `expected_output.ttl` - The expected RDF output in Turtle format
+
+## Usage
+
+The specification-driven tests are executed through:
+
+1. **Unit Tests**: `tests/test_specifications.py` - Runs all test cases in the `test_cases/` directory
+2. **Integration Tests**: `tests/test_reference_corpus.py` - Validates the entire reference corpus
+
+## Benefits
+
+- **Declarative**: Test behavior is captured in files rather than code
+- **Version Controlled**: Changes to expected behavior are tracked in git
+- **Agent-Friendly**: AI agents can easily understand and modify test specifications
+- **Maintainable**: No need to maintain complex Python test code for most scenarios
+- **Comprehensive**: Full system behavior is captured as artifacts
+
+## Test Philosophy
+
+This approach follows the principle that the system's behavior should be specified through examples rather than code. When behavior changes, the specifications are updated to reflect the new expected behavior, providing a clear audit trail of system evolution.
diff --git a/...ata/Alex Cipher-meetingnote-2024-11-07.md → ...pus/Alex Cipher-meetingnote-2024-11-07.md b/...ata/Alex Cipher-meetingnote-2024-11-07.md → ...pus/Alex Cipher-meetingnote-2024-11-07.md
@@ -9,10 +9,10 @@ created: 2024-11-07T14:02:26-05:00
 ## Notes
 
 He goes through staff-aug
-copa->agile one
+cipa->another company
 sometimes
 
-Blair Quantum is 87 and not retired - procurement
+Blair Quantum is not retired 
 
 User need and user assessment
 - products and offerings

diff --git a/specs/reference_corpus/Alex Cipher-meetingnote-2024-11-07.ttl b/specs/reference_corpus/Alex Cipher-meetingnote-2024-11-07.ttl
@@ -0,0 +1,26 @@
+@prefix kb: <http://example.org/kb/vocab#> .
+@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
+@prefix schema: <https://schema.org/> .
+@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
+
+<http://example.org/kb/wikilinks/kI4CgckKMyRwyZWO> a kb:Entity,
+        kb:WikiLink ;
+    rdfs:label "Alex Cipher"^^xsd:string ;
+    kb:originalText "[[Alex Cipher]]"^^xsd:string ;
+    kb:sourceDocument <http://example.org/kb/vocab#/test_corpus/Alex_Cipher_meetingnote_2024_11_07> ;
+    kb:targetPath "Alex Cipher"^^xsd:string ;
+    rdfs:seeAlso <http://example.org/kb/wikilinks/kI4CgckKMyRwyZWO> ;
+    schema:dateCreated "2025-09-10T22:58:49.099164+00:00"^^xsd:dateTime ;
+    schema:dateModified "2025-09-10T22:58:49.099165+00:00"^^xsd:dateTime .
+
+<http://example.org/kb/vocab#/test_corpus/Alex_Cipher_meetingnote_2024_11_07> a kb:Document,
+        kb:Entity,
+        schema:CreativeWork ;
+    rdfs:label "Temporary Document"^^xsd:string ;
+    kb:originalPath "temp_document.md"^^xsd:string ;
+    kb:pathWithoutExtension "temp_document"^^xsd:string ;
+    kb:sourceDocument <http://example.org/kb/vocab#/test_corpus/Alex_Cipher_meetingnote_2024_11_07> ;
+    rdfs:seeAlso <http://example.org/kb/vocab#/test_corpus/Alex_Cipher_meetingnote_2024_11_07> ;
+    schema:dateCreated "2025-09-10T22:58:49.097418+00:00"^^xsd:dateTime ;
+    schema:dateModified "2025-09-10T22:58:49.097420+00:00"^^xsd:dateTime .
+
diff --git a/sample_data/CTO Coffee-2024-11-07.md → ...reference_corpus/CTO Coffee-2024-11-07.md b/sample_data/CTO Coffee-2024-11-07.md → ...reference_corpus/CTO Coffee-2024-11-07.md
@@ -18,7 +18,7 @@ Tags: Tags
 
 ## Intros
 
-[[Deep Kapadia]]
+[[George Craft]]
 Engineering manager on a break
 Interested on learning
 Been an IC and manager up to 9 or 90 people
@@ -94,7 +94,7 @@ Depends on the client and how you are positioning to that client.
 Been a lot of org restructuring vs the coaching he and his product partner were pitching
 Moving to coaching but it hard without the directional clarity
 
-[[Jerome Thibaud]]
+[[Mark Temperence]]
 Find the problem and positioning yourself as the solution
 
 [[Me]]
@@ -115,7 +115,7 @@ The problems they need solving
 [[Alex Cipher]]
 
 
-[[Jerome Thibaud]]
+[[Mark Temperence]]
 
 Finding events where your prospects are