Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -30,4 +30,4 @@ claude-flow
claude-flow.bat
claude-flow.ps1
hive-mind-prompt-*.txt
.claude-flow/metrics
**/.claude-flow/metrics
77 changes: 77 additions & 0 deletions REFACTOR.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
Refactoring Plan: Specification-Driven TestingThis document outlines the process for refactoring the existing test suite into a specification-driven format. The goal is to capture the current "as-is" behavior of the system in declarative artifacts, which will enable a more robust, maintainable, and agent-friendly development workflow.🎯 Phase 1: Setup and ScaffoldingThis phase lays the foundation for the new testing structure.Task 1.1: Create the Specification Directory StructureCreate the following new directories in the root of the repository:specs/
├── README.md
├── reference_corpus/
└── test_cases/
specs/README.md: Create this file and add a brief explanation of the purpose of this new testing structure.specs/reference_corpus/: Copy the contents of the existing sample_data/ directory into this new directory. This will be our integration and regression testing suite.specs/test_cases/: This directory will hold the individual unit tests, with each feature getting its own subdirectory.Task 1.2: Create the Generic Test RunnerCreate a new test file, tests/test_specifications.py. This single file will eventually replace most of the existing unit tests. It will contain a generic, data-driven test runner.# tests/test_specifications.py
import pytest
from pathlib import Path
from knowledgebase_processor.processor import Processor

# You will need a way to compare two RDF graphs.
# The `rdflib.compare.isomorphic` function is perfect for this.
from rdflib import Graph
from rdflib.compare import isomorphic

def run_spec_test(test_case_dir: Path):
"""
Runs a single specification-driven test.
"""
input_md_path = test_case_dir / "input.md"
expected_output_ttl_path = test_case_dir / "expected_output.ttl"

# 1. Read the input markdown file
input_md_content = input_md_path.read_text()

# 2. Run the processor to get the "as-is" RDF graph
# NOTE: You will need a method on your Processor that can take a string
# of markdown and return an rdflib.Graph object.
processor = Processor(...) # Configure your processor as needed
as_is_graph = processor.process_content_to_graph(input_md_content)

# 3. Read the "to-be" (expected) RDF graph
expected_graph = Graph()
expected_graph.parse(str(expected_output_ttl_path), format="turtle")

# 4. Compare the two RDF graphs for isomorphism (i.e., they are equivalent)
assert isomorphic(as_is_graph, expected_graph)

# This function will automatically discover all your test cases
def get_test_cases():
specs_dir = Path("specs/test_cases")
if not specs_dir.exists():
return []
return [d for d in specs_dir.iterdir() if d.is_dir()]

@pytest.mark.parametrize("test_case_dir", get_test_cases())
def test_specifications(test_case_dir):
run_spec_test(test_case_dir)
🔬 Phase 2: "As-Is" State Capture (Unit Tests)This phase is the core of the refactoring effort. You will systematically convert each existing unit test into the new declarative format.Task 2.1: Convert test_todo_item_extractor.pyFor each test function in tests/extractor/test_todo_item_extractor.py:Create a Test Case Directory: Create a new subdirectory in specs/test_cases/ that describes the test (e.g., 01_extract_incomplete_todo).Create input.md: Take the Markdown string being used in the test and save it as input.md in the new directory.Generate expected_output.ttl: Temporarily modify the test function to run the full processor on the input and serialize the resulting RDF graph to a file. Save this as expected_output.ttl in the new directory.Run the New Test: Run pytest tests/test_specifications.py. The new test case should now be discovered and pass, confirming that you have successfully captured the "as-is" state.Delete the Old Test: Once the new test is passing, delete the original Python test function.Repeat this process until test_todo_item_extractor.py is empty, then delete the file.Task 2.2: Convert Remaining Extractor TestsRepeat the process from Task 2.1 for all remaining test files in the tests/extractor/ directory.Task 2.3: Convert Other Unit TestsContinue this process for all other relevant unit test files in the tests/ directory, such as those in tests/analyzer/ and tests/parser/.🚗 Phase 3: Integration and CleanupThis phase establishes the regression test suite and cleans up the old test files.Task 3.1: Create the Reference Corpus TestGenerate "As-Is" TTLs: Write a one-off script that iterates through every .md file in your specs/reference_corpus/ directory. For each file, run the processor and save the resulting RDF graph as a corresponding .ttl file in the same directory.Create a New Integration Test: Add a new test file, tests/test_reference_corpus.py. This test will be similar to the unit test runner but will work on the entire reference corpus.# tests/test_reference_corpus.py
import pytest
from pathlib import Path
from knowledgebase_processor.processor import Processor
from rdflib import Graph
from rdflib.compare import isomorphic

def run_corpus_test(markdown_path: Path):
expected_ttl_path = markdown_path.with_suffix(".ttl")

input_content = markdown_path.read_text()

processor = Processor(...) # Configure processor
as_is_graph = processor.process_content_to_graph(input_content)

expected_graph = Graph()
expected_graph.parse(str(expected_ttl_path), format="turtle")

assert isomorphic(as_is_graph, expected_graph)

def get_corpus_files():
corpus_dir = Path("specs/reference_corpus")
if not corpus_dir.exists():
return []
return list(corpus_dir.glob("*.md"))

@pytest.mark.parametrize("markdown_path", get_corpus_files())
def test_reference_corpus(markdown_path):
run_corpus_test(markdown_path)
Task 3.2: Final CleanupReview the tests/ directory and remove any remaining test files that have been made redundant by the new specification-driven approach.By the end of this process, your tests/ directory will be much smaller, and you will have a comprehensive, version-controlled, and easily updatable specification of your entire system's behavior in the specs/ directory.
177 changes: 0 additions & 177 deletions sample_data/DORA Community Discussion-2024-11-07.md

This file was deleted.

32 changes: 32 additions & 0 deletions specs/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
# Specification-Driven Testing

This directory contains the specification-driven testing structure for the knowledgebase-processor project. This approach captures the current "as-is" behavior of the system in declarative artifacts, enabling a more robust, maintainable, and agent-friendly development workflow.

## Directory Structure

### `reference_corpus/`
Contains the integration and regression testing suite. These are real-world markdown files that represent the expected inputs the system should handle. Each `.md` file has a corresponding `.ttl` file that represents the expected RDF output.

### `test_cases/`
Contains individual unit test specifications. Each subdirectory represents a specific test case with:
- `input.md` - The markdown input for the test
- `expected_output.ttl` - The expected RDF output in Turtle format

## Usage

The specification-driven tests are executed through:

1. **Unit Tests**: `tests/test_specifications.py` - Runs all test cases in the `test_cases/` directory
2. **Integration Tests**: `tests/test_reference_corpus.py` - Validates the entire reference corpus

## Benefits

- **Declarative**: Test behavior is captured in files rather than code
- **Version Controlled**: Changes to expected behavior are tracked in git
- **Agent-Friendly**: AI agents can easily understand and modify test specifications
- **Maintainable**: No need to maintain complex Python test code for most scenarios
- **Comprehensive**: Full system behavior is captured as artifacts

## Test Philosophy

This approach follows the principle that the system's behavior should be specified through examples rather than code. When behavior changes, the specifications are updated to reflect the new expected behavior, providing a clear audit trail of system evolution.
Original file line number Diff line number Diff line change
Expand Up @@ -9,10 +9,10 @@ created: 2024-11-07T14:02:26-05:00
## Notes

He goes through staff-aug
copa->agile one
cipa->another company
sometimes

Blair Quantum is 87 and not retired - procurement
Blair Quantum is not retired

User need and user assessment
- products and offerings
Expand Down
26 changes: 26 additions & 0 deletions specs/reference_corpus/Alex Cipher-meetingnote-2024-11-07.ttl
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
@prefix kb: <http://example.org/kb/vocab#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix schema: <https://schema.org/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

<http://example.org/kb/wikilinks/kI4CgckKMyRwyZWO> a kb:Entity,
kb:WikiLink ;
rdfs:label "Alex Cipher"^^xsd:string ;
kb:originalText "[[Alex Cipher]]"^^xsd:string ;
kb:sourceDocument <http://example.org/kb/vocab#/test_corpus/Alex_Cipher_meetingnote_2024_11_07> ;
kb:targetPath "Alex Cipher"^^xsd:string ;
rdfs:seeAlso <http://example.org/kb/wikilinks/kI4CgckKMyRwyZWO> ;
schema:dateCreated "2025-09-10T22:58:49.099164+00:00"^^xsd:dateTime ;
schema:dateModified "2025-09-10T22:58:49.099165+00:00"^^xsd:dateTime .

<http://example.org/kb/vocab#/test_corpus/Alex_Cipher_meetingnote_2024_11_07> a kb:Document,
kb:Entity,
schema:CreativeWork ;
rdfs:label "Temporary Document"^^xsd:string ;
kb:originalPath "temp_document.md"^^xsd:string ;
kb:pathWithoutExtension "temp_document"^^xsd:string ;
kb:sourceDocument <http://example.org/kb/vocab#/test_corpus/Alex_Cipher_meetingnote_2024_11_07> ;
rdfs:seeAlso <http://example.org/kb/vocab#/test_corpus/Alex_Cipher_meetingnote_2024_11_07> ;
schema:dateCreated "2025-09-10T22:58:49.097418+00:00"^^xsd:dateTime ;
schema:dateModified "2025-09-10T22:58:49.097420+00:00"^^xsd:dateTime .

Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ Tags: Tags

## Intros

[[Deep Kapadia]]
[[George Craft]]
Engineering manager on a break
Interested on learning
Been an IC and manager up to 9 or 90 people
Expand Down Expand Up @@ -94,7 +94,7 @@ Depends on the client and how you are positioning to that client.
Been a lot of org restructuring vs the coaching he and his product partner were pitching
Moving to coaching but it hard without the directional clarity

[[Jerome Thibaud]]
[[Mark Temperence]]
Find the problem and positioning yourself as the solution

[[Me]]
Expand All @@ -115,7 +115,7 @@ The problems they need solving
[[Alex Cipher]]


[[Jerome Thibaud]]
[[Mark Temperence]]

Finding events where your prospects are

Expand Down
Loading