Skip to content

Commit 9daa62c

Browse files
authored
Merge pull request #60 from dstengle/refactor-test-framework
2 parents 180bbc6 + a0885c9 commit 9daa62c

File tree

164 files changed

+3038
-2034
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

164 files changed

+3038
-2034
lines changed

.gitignore

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -30,4 +30,4 @@ claude-flow
3030
claude-flow.bat
3131
claude-flow.ps1
3232
hive-mind-prompt-*.txt
33-
.claude-flow/metrics
33+
**/.claude-flow/metrics

REFACTOR.md

Lines changed: 77 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,77 @@
1+
Refactoring Plan: Specification-Driven TestingThis document outlines the process for refactoring the existing test suite into a specification-driven format. The goal is to capture the current "as-is" behavior of the system in declarative artifacts, which will enable a more robust, maintainable, and agent-friendly development workflow.🎯 Phase 1: Setup and ScaffoldingThis phase lays the foundation for the new testing structure.Task 1.1: Create the Specification Directory StructureCreate the following new directories in the root of the repository:specs/
2+
├── README.md
3+
├── reference_corpus/
4+
└── test_cases/
5+
specs/README.md: Create this file and add a brief explanation of the purpose of this new testing structure.specs/reference_corpus/: Copy the contents of the existing sample_data/ directory into this new directory. This will be our integration and regression testing suite.specs/test_cases/: This directory will hold the individual unit tests, with each feature getting its own subdirectory.Task 1.2: Create the Generic Test RunnerCreate a new test file, tests/test_specifications.py. This single file will eventually replace most of the existing unit tests. It will contain a generic, data-driven test runner.# tests/test_specifications.py
6+
import pytest
7+
from pathlib import Path
8+
from knowledgebase_processor.processor import Processor
9+
10+
# You will need a way to compare two RDF graphs.
11+
# The `rdflib.compare.isomorphic` function is perfect for this.
12+
from rdflib import Graph
13+
from rdflib.compare import isomorphic
14+
15+
def run_spec_test(test_case_dir: Path):
16+
"""
17+
Runs a single specification-driven test.
18+
"""
19+
input_md_path = test_case_dir / "input.md"
20+
expected_output_ttl_path = test_case_dir / "expected_output.ttl"
21+
22+
# 1. Read the input markdown file
23+
input_md_content = input_md_path.read_text()
24+
25+
# 2. Run the processor to get the "as-is" RDF graph
26+
# NOTE: You will need a method on your Processor that can take a string
27+
# of markdown and return an rdflib.Graph object.
28+
processor = Processor(...) # Configure your processor as needed
29+
as_is_graph = processor.process_content_to_graph(input_md_content)
30+
31+
# 3. Read the "to-be" (expected) RDF graph
32+
expected_graph = Graph()
33+
expected_graph.parse(str(expected_output_ttl_path), format="turtle")
34+
35+
# 4. Compare the two RDF graphs for isomorphism (i.e., they are equivalent)
36+
assert isomorphic(as_is_graph, expected_graph)
37+
38+
# This function will automatically discover all your test cases
39+
def get_test_cases():
40+
specs_dir = Path("specs/test_cases")
41+
if not specs_dir.exists():
42+
return []
43+
return [d for d in specs_dir.iterdir() if d.is_dir()]
44+
45+
@pytest.mark.parametrize("test_case_dir", get_test_cases())
46+
def test_specifications(test_case_dir):
47+
run_spec_test(test_case_dir)
48+
🔬 Phase 2: "As-Is" State Capture (Unit Tests)This phase is the core of the refactoring effort. You will systematically convert each existing unit test into the new declarative format.Task 2.1: Convert test_todo_item_extractor.pyFor each test function in tests/extractor/test_todo_item_extractor.py:Create a Test Case Directory: Create a new subdirectory in specs/test_cases/ that describes the test (e.g., 01_extract_incomplete_todo).Create input.md: Take the Markdown string being used in the test and save it as input.md in the new directory.Generate expected_output.ttl: Temporarily modify the test function to run the full processor on the input and serialize the resulting RDF graph to a file. Save this as expected_output.ttl in the new directory.Run the New Test: Run pytest tests/test_specifications.py. The new test case should now be discovered and pass, confirming that you have successfully captured the "as-is" state.Delete the Old Test: Once the new test is passing, delete the original Python test function.Repeat this process until test_todo_item_extractor.py is empty, then delete the file.Task 2.2: Convert Remaining Extractor TestsRepeat the process from Task 2.1 for all remaining test files in the tests/extractor/ directory.Task 2.3: Convert Other Unit TestsContinue this process for all other relevant unit test files in the tests/ directory, such as those in tests/analyzer/ and tests/parser/.🚗 Phase 3: Integration and CleanupThis phase establishes the regression test suite and cleans up the old test files.Task 3.1: Create the Reference Corpus TestGenerate "As-Is" TTLs: Write a one-off script that iterates through every .md file in your specs/reference_corpus/ directory. For each file, run the processor and save the resulting RDF graph as a corresponding .ttl file in the same directory.Create a New Integration Test: Add a new test file, tests/test_reference_corpus.py. This test will be similar to the unit test runner but will work on the entire reference corpus.# tests/test_reference_corpus.py
49+
import pytest
50+
from pathlib import Path
51+
from knowledgebase_processor.processor import Processor
52+
from rdflib import Graph
53+
from rdflib.compare import isomorphic
54+
55+
def run_corpus_test(markdown_path: Path):
56+
expected_ttl_path = markdown_path.with_suffix(".ttl")
57+
58+
input_content = markdown_path.read_text()
59+
60+
processor = Processor(...) # Configure processor
61+
as_is_graph = processor.process_content_to_graph(input_content)
62+
63+
expected_graph = Graph()
64+
expected_graph.parse(str(expected_ttl_path), format="turtle")
65+
66+
assert isomorphic(as_is_graph, expected_graph)
67+
68+
def get_corpus_files():
69+
corpus_dir = Path("specs/reference_corpus")
70+
if not corpus_dir.exists():
71+
return []
72+
return list(corpus_dir.glob("*.md"))
73+
74+
@pytest.mark.parametrize("markdown_path", get_corpus_files())
75+
def test_reference_corpus(markdown_path):
76+
run_corpus_test(markdown_path)
77+
Task 3.2: Final CleanupReview the tests/ directory and remove any remaining test files that have been made redundant by the new specification-driven approach.By the end of this process, your tests/ directory will be much smaller, and you will have a comprehensive, version-controlled, and easily updatable specification of your entire system's behavior in the specs/ directory.

sample_data/DORA Community Discussion-2024-11-07.md

Lines changed: 0 additions & 177 deletions
This file was deleted.

specs/README.md

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
# Specification-Driven Testing
2+
3+
This directory contains the specification-driven testing structure for the knowledgebase-processor project. This approach captures the current "as-is" behavior of the system in declarative artifacts, enabling a more robust, maintainable, and agent-friendly development workflow.
4+
5+
## Directory Structure
6+
7+
### `reference_corpus/`
8+
Contains the integration and regression testing suite. These are real-world markdown files that represent the expected inputs the system should handle. Each `.md` file has a corresponding `.ttl` file that represents the expected RDF output.
9+
10+
### `test_cases/`
11+
Contains individual unit test specifications. Each subdirectory represents a specific test case with:
12+
- `input.md` - The markdown input for the test
13+
- `expected_output.ttl` - The expected RDF output in Turtle format
14+
15+
## Usage
16+
17+
The specification-driven tests are executed through:
18+
19+
1. **Unit Tests**: `tests/test_specifications.py` - Runs all test cases in the `test_cases/` directory
20+
2. **Integration Tests**: `tests/test_reference_corpus.py` - Validates the entire reference corpus
21+
22+
## Benefits
23+
24+
- **Declarative**: Test behavior is captured in files rather than code
25+
- **Version Controlled**: Changes to expected behavior are tracked in git
26+
- **Agent-Friendly**: AI agents can easily understand and modify test specifications
27+
- **Maintainable**: No need to maintain complex Python test code for most scenarios
28+
- **Comprehensive**: Full system behavior is captured as artifacts
29+
30+
## Test Philosophy
31+
32+
This approach follows the principle that the system's behavior should be specified through examples rather than code. When behavior changes, the specifications are updated to reflect the new expected behavior, providing a clear audit trail of system evolution.

sample_data/Alex Cipher-meetingnote-2024-11-07.md renamed to specs/reference_corpus/Alex Cipher-meetingnote-2024-11-07.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -9,10 +9,10 @@ created: 2024-11-07T14:02:26-05:00
99
## Notes
1010

1111
He goes through staff-aug
12-
copa->agile one
12+
cipa->another company
1313
sometimes
1414

15-
Blair Quantum is 87 and not retired - procurement
15+
Blair Quantum is not retired
1616

1717
User need and user assessment
1818
- products and offerings
Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
@prefix kb: <http://example.org/kb/vocab#> .
2+
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
3+
@prefix schema: <https://schema.org/> .
4+
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
5+
6+
<http://example.org/kb/wikilinks/kI4CgckKMyRwyZWO> a kb:Entity,
7+
kb:WikiLink ;
8+
rdfs:label "Alex Cipher"^^xsd:string ;
9+
kb:originalText "[[Alex Cipher]]"^^xsd:string ;
10+
kb:sourceDocument <http://example.org/kb/vocab#/test_corpus/Alex_Cipher_meetingnote_2024_11_07> ;
11+
kb:targetPath "Alex Cipher"^^xsd:string ;
12+
rdfs:seeAlso <http://example.org/kb/wikilinks/kI4CgckKMyRwyZWO> ;
13+
schema:dateCreated "2025-09-10T22:58:49.099164+00:00"^^xsd:dateTime ;
14+
schema:dateModified "2025-09-10T22:58:49.099165+00:00"^^xsd:dateTime .
15+
16+
<http://example.org/kb/vocab#/test_corpus/Alex_Cipher_meetingnote_2024_11_07> a kb:Document,
17+
kb:Entity,
18+
schema:CreativeWork ;
19+
rdfs:label "Temporary Document"^^xsd:string ;
20+
kb:originalPath "temp_document.md"^^xsd:string ;
21+
kb:pathWithoutExtension "temp_document"^^xsd:string ;
22+
kb:sourceDocument <http://example.org/kb/vocab#/test_corpus/Alex_Cipher_meetingnote_2024_11_07> ;
23+
rdfs:seeAlso <http://example.org/kb/vocab#/test_corpus/Alex_Cipher_meetingnote_2024_11_07> ;
24+
schema:dateCreated "2025-09-10T22:58:49.097418+00:00"^^xsd:dateTime ;
25+
schema:dateModified "2025-09-10T22:58:49.097420+00:00"^^xsd:dateTime .
26+

sample_data/CTO Coffee-2024-11-07.md renamed to specs/reference_corpus/CTO Coffee-2024-11-07.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ Tags: Tags
1818

1919
## Intros
2020

21-
[[Deep Kapadia]]
21+
[[George Craft]]
2222
Engineering manager on a break
2323
Interested on learning
2424
Been an IC and manager up to 9 or 90 people
@@ -94,7 +94,7 @@ Depends on the client and how you are positioning to that client.
9494
Been a lot of org restructuring vs the coaching he and his product partner were pitching
9595
Moving to coaching but it hard without the directional clarity
9696

97-
[[Jerome Thibaud]]
97+
[[Mark Temperence]]
9898
Find the problem and positioning yourself as the solution
9999

100100
[[Me]]
@@ -115,7 +115,7 @@ The problems they need solving
115115
[[Alex Cipher]]
116116

117117

118-
[[Jerome Thibaud]]
118+
[[Mark Temperence]]
119119

120120
Finding events where your prospects are
121121

0 commit comments

Comments
 (0)