|
| 1 | +# Tutorial |
| 2 | + |
| 3 | +A progressive walkthrough of knowledgecomplex, from schema definition through algebraic topology. |
| 4 | + |
| 5 | +## 1. Define a schema |
| 6 | + |
| 7 | +A schema declares vertex, edge, and face types with attributes. The `SchemaBuilder` generates OWL and SHACL automatically. |
| 8 | + |
| 9 | +```python |
| 10 | +from knowledgecomplex import SchemaBuilder, vocab, text |
| 11 | + |
| 12 | +sb = SchemaBuilder(namespace="vv") |
| 13 | + |
| 14 | +# Vertex types (subclass of kc:Vertex) |
| 15 | +sb.add_vertex_type("requirement", attributes={"title": text()}) |
| 16 | +sb.add_vertex_type("test_case", attributes={"title": text()}) |
| 17 | + |
| 18 | +# Edge type with controlled vocabulary (enforced via sh:in) |
| 19 | +sb.add_edge_type("verifies", attributes={ |
| 20 | + "status": vocab("passing", "failing", "pending"), |
| 21 | +}) |
| 22 | + |
| 23 | +# Face type |
| 24 | +sb.add_face_type("coverage") |
| 25 | +``` |
| 26 | + |
| 27 | +### Attribute descriptors |
| 28 | + |
| 29 | +| Descriptor | What it generates | Example | |
| 30 | +|---|---|---| |
| 31 | +| `text()` | `xsd:string`, required, single-valued | `title: text()` | |
| 32 | +| `text(required=False)` | `xsd:string`, optional | `notes: text(required=False)` | |
| 33 | +| `text(multiple=True)` | `xsd:string`, required, multi-valued | `tags: text(multiple=True)` | |
| 34 | +| `vocab("a", "b")` | `sh:in ("a" "b")`, required, single-valued | `status: vocab("pass", "fail")` | |
| 35 | + |
| 36 | +### Type inheritance and binding |
| 37 | + |
| 38 | +Types can inherit from other user-defined types. Child types can bind inherited attributes to fixed values: |
| 39 | + |
| 40 | +```python |
| 41 | +sb.add_vertex_type("document", attributes={"title": text(), "category": text()}) |
| 42 | +sb.add_vertex_type("specification", parent="document", |
| 43 | + attributes={"format": text()}, |
| 44 | + bind={"category": "structural"}) |
| 45 | +``` |
| 46 | + |
| 47 | +### Introspection |
| 48 | + |
| 49 | +```python |
| 50 | +sb.describe_type("specification") |
| 51 | +# {'name': 'specification', 'kind': 'vertex', 'parent': 'document', |
| 52 | +# 'own_attributes': {'format': text()}, |
| 53 | +# 'inherited_attributes': {'title': text(), 'category': text()}, |
| 54 | +# 'all_attributes': {'title': text(), 'category': text(), 'format': text()}, |
| 55 | +# 'bound': {'category': 'structural'}} |
| 56 | + |
| 57 | +sb.type_names(kind="vertex") # ['document', 'specification'] |
| 58 | +``` |
| 59 | + |
| 60 | +## 2. Build a complex |
| 61 | + |
| 62 | +A `KnowledgeComplex` manages instances. Every write triggers SHACL verification — the graph is always in a valid state. |
| 63 | + |
| 64 | +```python |
| 65 | +from knowledgecomplex import KnowledgeComplex |
| 66 | + |
| 67 | +kc = KnowledgeComplex(schema=sb) |
| 68 | + |
| 69 | +# Vertices have no boundary — always valid |
| 70 | +kc.add_vertex("req-001", type="requirement", title="Boot time < 5s") |
| 71 | +kc.add_vertex("tc-001", type="test_case", title="Boot smoke test") |
| 72 | +kc.add_vertex("tc-002", type="test_case", title="Boot regression") |
| 73 | + |
| 74 | +# Edges need their boundary vertices to already exist (slice rule) |
| 75 | +kc.add_edge("ver-001", type="verifies", |
| 76 | + vertices={"req-001", "tc-001"}, status="passing") |
| 77 | +kc.add_edge("ver-002", type="verifies", |
| 78 | + vertices={"req-001", "tc-002"}, status="pending") |
| 79 | +kc.add_edge("ver-003", type="verifies", |
| 80 | + vertices={"tc-001", "tc-002"}, status="passing") |
| 81 | + |
| 82 | +# Faces need 3 boundary edges forming a closed triangle |
| 83 | +kc.add_face("cov-001", type="coverage", |
| 84 | + boundary=["ver-001", "ver-002", "ver-003"]) |
| 85 | +``` |
| 86 | + |
| 87 | +### What gets enforced |
| 88 | + |
| 89 | +| Constraint | When | What happens | |
| 90 | +|---|---|---| |
| 91 | +| Type must be registered | Before RDF assertions | `ValidationError` | |
| 92 | +| Boundary cardinality (2 for edges, 3 for faces) | Before SHACL | `ValueError` | |
| 93 | +| Boundary elements must exist in complex (slice rule) | SHACL on write | `ValidationError` + rollback | |
| 94 | +| Vocab values must be in allowed set | SHACL on write | `ValidationError` + rollback | |
| 95 | +| Face boundary edges must form closed triangle | SHACL on write | `ValidationError` + rollback | |
| 96 | + |
| 97 | +### Element handles |
| 98 | + |
| 99 | +```python |
| 100 | +elem = kc.element("req-001") |
| 101 | +elem.id # "req-001" |
| 102 | +elem.type # "requirement" |
| 103 | +elem.attrs # {"title": "Boot time < 5s"} |
| 104 | + |
| 105 | +kc.element_ids(type="test_case") # ["tc-001", "tc-002"] |
| 106 | +kc.elements(type="test_case") # [Element('tc-001', ...), Element('tc-002', ...)] |
| 107 | +``` |
| 108 | + |
| 109 | +## 3. Topological queries |
| 110 | + |
| 111 | +Every query returns `set[str]` for natural set algebra. All accept an optional `type=` filter. |
| 112 | + |
| 113 | +```python |
| 114 | +# Boundary operator ∂ |
| 115 | +kc.boundary("ver-001") # {'req-001', 'tc-001'} (edge → vertices) |
| 116 | +kc.boundary("cov-001") # {'ver-001', 'ver-002', 'ver-003'} (face → edges) |
| 117 | +kc.boundary("req-001") # set() (vertex → empty) |
| 118 | + |
| 119 | +# Coboundary (inverse boundary) |
| 120 | +kc.coboundary("req-001") # {'ver-001', 'ver-002'} (vertex → incident edges) |
| 121 | + |
| 122 | +# Star: all simplices containing σ as a face |
| 123 | +kc.star("req-001") # req-001 + incident edges + incident faces |
| 124 | + |
| 125 | +# Closure: smallest subcomplex containing σ |
| 126 | +kc.closure("cov-001") # cov-001 + 3 edges + 3 vertices |
| 127 | + |
| 128 | +# Link: Cl(St(σ)) \ St(σ) |
| 129 | +kc.link("req-001") |
| 130 | + |
| 131 | +# Skeleton: elements up to dimension k |
| 132 | +kc.skeleton(0) # vertices only |
| 133 | +kc.skeleton(1) # vertices + edges |
| 134 | + |
| 135 | +# Degree |
| 136 | +kc.degree("req-001") # 2 |
| 137 | + |
| 138 | +# Subcomplex check |
| 139 | +kc.is_subcomplex({"req-001", "tc-001", "ver-001"}) # True |
| 140 | +kc.is_subcomplex({"ver-001"}) # False (missing vertices) |
| 141 | + |
| 142 | +# Set algebra composes naturally |
| 143 | +shared = kc.star("req-001") & kc.star("tc-001") |
| 144 | +``` |
| 145 | + |
| 146 | +## 4. Local partitioning |
| 147 | + |
| 148 | +The topological queries above use combinatorial adjacency — boundary, star, and closure walk the simplicial structure directly. Local partitioning uses **diffusion** instead: spread probability from a seed and sweep the result to find a natural cluster boundary. This finds structure that combinatorial queries miss. |
| 149 | + |
| 150 | +Requires `pip install knowledgecomplex[analysis]`. |
| 151 | + |
| 152 | +### Graph partitioning (vertex clusters) |
| 153 | + |
| 154 | +Diffuse from a seed vertex using personalized PageRank or the heat kernel, then sweep the resulting distribution to find a cut with low conductance: |
| 155 | + |
| 156 | +```python |
| 157 | +from knowledgecomplex.analysis import ( |
| 158 | + approximate_pagerank, heat_kernel_pagerank, |
| 159 | + sweep_cut, local_partition, |
| 160 | +) |
| 161 | + |
| 162 | +# Approximate PageRank: push-based diffusion (Andersen-Chung-Lang) |
| 163 | +p, r = approximate_pagerank(kc, seed="req-001", alpha=0.15) |
| 164 | +# p is a sparse dict of vertex → probability; more mass near seed |
| 165 | + |
| 166 | +# Heat kernel PageRank: exponential diffusion (Fan Chung) |
| 167 | +rho = heat_kernel_pagerank(kc, seed="req-001", t=5.0) |
| 168 | +# t controls locality: small t = tight cluster, large t = broad spread |
| 169 | + |
| 170 | +# Sweep either distribution to find a low-conductance cut |
| 171 | +cut = sweep_cut(kc, p) |
| 172 | +cut.vertices # set of vertex IDs on the small side |
| 173 | +cut.conductance # Cheeger ratio — lower means cleaner partition |
| 174 | + |
| 175 | +# Or use local_partition for the full pipeline in one call |
| 176 | +cut = local_partition(kc, seed="req-001", method="pagerank") |
| 177 | +cut = local_partition(kc, seed="req-001", method="heat_kernel") |
| 178 | +``` |
| 179 | + |
| 180 | +### Edge partitioning (simplicial clusters) |
| 181 | + |
| 182 | +The simplicial version replaces the graph Laplacian with the **Hodge Laplacian** on edges. Instead of partitioning vertices, it partitions edges — finding clusters of relationships: |
| 183 | + |
| 184 | +```python |
| 185 | +from knowledgecomplex.analysis import edge_local_partition |
| 186 | + |
| 187 | +# Hodge PageRank: (βI + L₁)⁻¹ χ_e — diffusion on the edge space |
| 188 | +cut = edge_local_partition(kc, seed_edge="ver-001", method="hodge_pagerank") |
| 189 | + |
| 190 | +# Hodge heat kernel: e^{-tL₁} χ_e — exponential diffusion on edges |
| 191 | +cut = edge_local_partition(kc, seed_edge="ver-001", method="hodge_heat", t=5.0) |
| 192 | + |
| 193 | +cut.edges # set of edge IDs in the cluster |
| 194 | +cut.conductance # edge conductance |
| 195 | +``` |
| 196 | + |
| 197 | +The key difference: graph partitioning asks "which vertices are near this vertex?" while edge partitioning asks "which relationships are near this relationship?" — a question that only makes sense in a simplicial complex, not in a plain graph. |
| 198 | + |
| 199 | +## 5. Algebraic topology |
| 200 | + |
| 201 | +Requires `pip install knowledgecomplex[analysis]`. |
| 202 | + |
| 203 | +```python |
| 204 | +from knowledgecomplex.analysis import ( |
| 205 | + boundary_matrices, betti_numbers, euler_characteristic, |
| 206 | + hodge_laplacian, edge_pagerank, hodge_decomposition, hodge_analysis, |
| 207 | +) |
| 208 | + |
| 209 | +# Boundary matrices (sparse) |
| 210 | +bm = boundary_matrices(kc) |
| 211 | +# bm.B1: (n_vertices × n_edges), bm.B2: (n_edges × n_faces) |
| 212 | +# Invariant: B1 @ B2 = 0 (∂₁ ∘ ∂₂ = 0) |
| 213 | + |
| 214 | +# Betti numbers |
| 215 | +betti = betti_numbers(kc) # [β₀, β₁, β₂] |
| 216 | +chi = euler_characteristic(kc) # V - E + F = β₀ - β₁ + β₂ |
| 217 | + |
| 218 | +# Hodge Laplacian |
| 219 | +L1 = hodge_laplacian(kc) # B1ᵀB1 + B2B2ᵀ (symmetric PSD) |
| 220 | +# dim(ker L₁) = β₁ |
| 221 | + |
| 222 | +# Edge PageRank |
| 223 | +pr = edge_pagerank(kc, "ver-001", beta=0.1) # (βI + L₁)⁻¹ χ_e |
| 224 | + |
| 225 | +# Hodge decomposition: flow = gradient + curl + harmonic |
| 226 | +decomp = hodge_decomposition(kc, pr) |
| 227 | +# decomp.gradient — im(B1ᵀ), vertex-driven flow |
| 228 | +# decomp.curl — im(B2), face-driven circulation |
| 229 | +# decomp.harmonic — ker(L₁), topological cycles |
| 230 | + |
| 231 | +# Full analysis in one call |
| 232 | +results = hodge_analysis(kc, beta=0.1) |
| 233 | +``` |
| 234 | + |
| 235 | +All analysis functions accept an optional `weights` dict mapping element IDs to scalar weights, which factor into the Laplacian as diagonal weight matrices. |
| 236 | + |
| 237 | +## 6. Filtrations |
| 238 | + |
| 239 | +A filtration is a nested sequence of valid subcomplexes: C₀ ⊆ C₁ ⊆ ... ⊆ Cₘ. |
| 240 | + |
| 241 | +```python |
| 242 | +from knowledgecomplex import Filtration |
| 243 | + |
| 244 | +filt = Filtration(kc) |
| 245 | +filt.append({"req-001"}) # must be valid subcomplex |
| 246 | +filt.append_closure({"ver-001"}) # auto-closes + unions with previous |
| 247 | +filt.append_closure({"cov-001"}) # adds face + all boundary |
| 248 | + |
| 249 | +filt.birth("cov-001") # index where element first appears |
| 250 | +filt.new_at(2) # elements added at step 2 (Cₚ \ Cₚ₋₁) |
| 251 | +filt[1] # set of element IDs at step 1 |
| 252 | + |
| 253 | +# Build from a scoring function |
| 254 | +filt2 = Filtration.from_function(kc, lambda eid: some_score(eid)) |
| 255 | +``` |
| 256 | + |
| 257 | +## 7. Clique inference |
| 258 | + |
| 259 | +Discover higher-order structure hiding in the edge graph: |
| 260 | + |
| 261 | +```python |
| 262 | +from knowledgecomplex import find_cliques, infer_faces |
| 263 | + |
| 264 | +# Pure query — what triangles exist? |
| 265 | +triangles = find_cliques(kc, k=3) |
| 266 | + |
| 267 | +# Fill in all triangles as typed faces |
| 268 | +added = infer_faces(kc, "coverage") |
| 269 | + |
| 270 | +# Preview without modifying |
| 271 | +preview = infer_faces(kc, "coverage", dry_run=True) |
| 272 | +``` |
| 273 | + |
| 274 | +## 8. Export and load |
| 275 | + |
| 276 | +```python |
| 277 | +# Export schema + instance to a directory |
| 278 | +kc.export("output/my_complex") |
| 279 | +# Creates: ontology.ttl, shapes.ttl, instance.ttl, queries/*.sparql |
| 280 | + |
| 281 | +# Reconstruct from exported files |
| 282 | +kc2 = KnowledgeComplex.load("output/my_complex") |
| 283 | +kc2.audit().conforms # True |
| 284 | +``` |
| 285 | + |
| 286 | +Multi-format serialization: |
| 287 | + |
| 288 | +```python |
| 289 | +from knowledgecomplex import save_graph, load_graph |
| 290 | + |
| 291 | +save_graph(kc, "data.jsonld", format="json-ld") |
| 292 | +load_graph(kc, "data.ttl") # additive loading |
| 293 | +``` |
| 294 | + |
| 295 | +## 9. Verification and audit |
| 296 | + |
| 297 | +```python |
| 298 | +# Throwing verification |
| 299 | +kc.verify() # raises ValidationError on failure |
| 300 | + |
| 301 | +# Non-throwing audit |
| 302 | +report = kc.audit() |
| 303 | +report.conforms # bool |
| 304 | +report.violations # list[AuditViolation] |
| 305 | +print(report) # human-readable summary |
| 306 | + |
| 307 | +# Deferred verification for bulk construction |
| 308 | +with kc.deferred_verification(): |
| 309 | + for item in big_dataset: |
| 310 | + kc.add_vertex(item.id, type=item.type, **item.attrs) |
| 311 | + # ... add edges, faces ... |
| 312 | +# Single SHACL pass runs on exit |
| 313 | + |
| 314 | +# Static file verification (no Python objects needed) |
| 315 | +from knowledgecomplex import audit_file |
| 316 | +report = audit_file("data/instance.ttl", shapes="data/shapes.ttl", |
| 317 | + ontology="data/ontology.ttl") |
| 318 | +``` |
| 319 | + |
| 320 | +## 10. Pre-built ontologies |
| 321 | + |
| 322 | +Three ontologies ship with the package: |
| 323 | + |
| 324 | +```python |
| 325 | +from knowledgecomplex.ontologies import operations, brand, research |
| 326 | + |
| 327 | +sb = operations.schema() # actor, activity, resource |
| 328 | +sb = brand.schema() # audience, theme |
| 329 | +sb = research.schema() # paper, concept, note |
| 330 | +``` |
| 331 | + |
| 332 | +## Gotchas |
| 333 | + |
| 334 | +| Issue | Detail | |
| 335 | +|---|---| |
| 336 | +| **Slice rule** | Boundary elements must exist before the element that references them. Add vertices → edges → faces. | |
| 337 | +| **Closed triangle** | A face's 3 edges must span exactly 3 vertices in a cycle. An open fan or 4-vertex path will fail. | |
| 338 | +| **`remove_element`** | No post-removal verification. Remove faces before their edges, edges before their vertices. | |
| 339 | +| **Schema after `load()`** | `load()` recovers type names, kinds, attributes, and parent relationships from OWL + SHACL. Full `describe_type()` introspection works after loading. | |
| 340 | +| **Deferred verification** | Inside the context manager, intermediate states need not be valid. Verification runs once on exit. | |
| 341 | +| **Face orientation** | Boundary matrix signs are computed internally to guarantee ∂₁∘∂₂ = 0. The orientation is consistent but not guaranteed to match external conventions. | |
0 commit comments