colliery-io
diff --git a/‎.metis/backlog/bugs/GQLITE-T-0112.md‎
Lines changed: 92 additions & 0 deletions b/‎.metis/backlog/bugs/GQLITE-T-0112.md‎
Lines changed: 92 additions & 0 deletions
diff --git a/‎.metis/backlog/bugs/GQLITE-T-0119.md‎
Lines changed: 74 additions & 0 deletions b/‎.metis/backlog/bugs/GQLITE-T-0119.md‎
Lines changed: 74 additions & 0 deletions
diff --git a/‎.metis/backlog/bugs/GQLITE-T-0120.md‎
Lines changed: 70 additions & 0 deletions b/‎.metis/backlog/bugs/GQLITE-T-0120.md‎
Lines changed: 70 additions & 0 deletions
diff --git a/‎.metis/backlog/bugs/GQLITE-T-0121.md‎
Lines changed: 74 additions & 0 deletions b/‎.metis/backlog/bugs/GQLITE-T-0121.md‎
Lines changed: 74 additions & 0 deletions
diff --git a/‎.metis/backlog/bugs/GQLITE-T-0122.md‎
Lines changed: 69 additions & 0 deletions b/‎.metis/backlog/bugs/GQLITE-T-0122.md‎
Lines changed: 69 additions & 0 deletions
@@ -0,0 +1,92 @@
+---
+id: segfault-on-parameterized-bfs-dfs
+level: task
+title: "Segfault on parameterized bfs/dfs traversals"
+short_code: "GQLITE-T-0112"
+created_at: 2026-03-03T02:12:25.655318+00:00
+updated_at: 2026-03-03T02:27:22.507223+00:00
+parent: 
+blocked_by: []
+archived: false
+
+tags:
+  - "#task"
+  - "#bug"
+  - "#phase/active"
+
+
+exit_criteria_met: false
+strategy_id: NULL
+initiative_id: NULL
+---
+
+# Segfault on parameterized bfs/dfs traversals
+
+GitHub Issue: #27 (reported by @kynx)
+Version: 0.3.5, SQLite 3.51.2
+
+## Objective
+
+Fix segmentation fault when using parameterized `bfs()` / `dfs()` traversals on a populated graph.
+
+## Bug Details
+
+- **Priority**: P0 - segfault crashes the sqlite3 process
+- **Reproduction Steps**:
+  1. `select cypher('RETURN bfs($a)', '{"a": "A"}');` — works on empty graph
+  2. `select cypher('CREATE (a:Node {id: ''A''})');` — create a node
+  3. `select cypher('RETURN bfs($a)', '{"a": "A"}');` — segfault
+- **Expected**: Returns traversal result with the created node
+- **Actual**: Segmentation fault (signal 11)
+
+## Acceptance Criteria
+
+## Acceptance Criteria
+
+## Acceptance Criteria
+
+- [ ] `RETURN bfs($param)` with parameters works on populated graphs
+- [ ] `RETURN dfs($param)` with parameters works on populated graphs
+- [ ] Functional test covering parameterized traversals added
+- [ ] No segfault under any combination of empty/populated graph + params
+
+## Investigation Findings
+
+**Root cause**: `detect_graph_algorithm()` (`graph_algorithms.c:477-504`) only handles `AST_NODE_LITERAL` args, ignoring `AST_NODE_PARAMETER`. When `$a` is passed, `source_id` stays NULL → passed to `execute_bfs()` → `strcmp(user_id, NULL)` segfaults at `graph_algo_traversal.c:149`.
+
+Empty graphs don't crash because the node loop (n=0) never executes the strcmp.
+
+**Crash path**: `query_dispatch.c:914` → `detect_graph_algorithm()` → `graph_algorithms.c:477` (literal-only check) → `query_dispatch.c:977` (NULL source_id) → `graph_algo_traversal.c:149` (strcmp with NULL)
+
+**All affected functions**: bfs, dfs, dijkstra, astar, nodeSimilarity, knn — all have the same literal-only parameter extraction.
+
+**Fix approach**: Resolve `AST_NODE_PARAMETER` nodes in `detect_graph_algorithm()` using the executor's `params_json` (infrastructure already exists via `get_param_value()` in `executor_helpers.c`).
+
+## Implementation Plan
+
+1. **NULL guards** in `graph_algo_traversal.c`: `execute_bfs()` and `execute_dfs()` return empty result if `start_id` is NULL (safety net)
+2. **Signature change**: Add `const char *params_json` to `detect_graph_algorithm()` in `graph_algorithms.h` and update call site in `query_dispatch.c:914`
+3. **Parameter resolution**: Add `resolve_string_arg()` helper in `graph_algorithms.c` that handles both `AST_NODE_LITERAL` and `AST_NODE_PARAMETER` via existing `get_param_value()`. Apply to all affected algos: bfs, dfs, dijkstra, astar, nodeSimilarity, knn
+4. **Tests**: `tests/functional/36_parameterized_algorithms.sql` (already written)
+
+### Files to modify
+- `src/backend/executor/graph_algo_traversal.c` — NULL guards
+- `src/include/executor/graph_algorithms.h` — signature change
+- `src/backend/executor/query_dispatch.c` — pass `executor->params_json`
+- `src/backend/executor/graph_algorithms.c` — helper + all extraction sites
+
+## Status Updates
+
+### Implementation Complete
+All 4 changes implemented:
+
+1. **NULL guards** — `execute_bfs()` and `execute_dfs()` now return `[]` if `start_id` is NULL
+2. **Signature change** — `detect_graph_algorithm()` now accepts `const char *params_json`; call site in `query_dispatch.c` updated to pass `executor->params_json`
+3. **Parameter resolution** — Added `resolve_string_arg()` static helper that handles both `AST_NODE_LITERAL` and `AST_NODE_PARAMETER` via `get_param_value()`. Applied to all 6 affected algorithms: bfs, dfs, dijkstra, astar, nodeSimilarity, knn
+4. **Include** — Added `#include "executor/executor_internal.h"` for `get_param_value()` and `property_type`
+
+### Verification
+- `angreal build extension` — builds clean, no warnings
+- `tests/functional/36_parameterized_algorithms.sql` — all 20 tests pass, no segfault
+- `angreal test functional` — all existing functional tests pass
+- `angreal test unit` — all 770+ unit tests pass
@@ -0,0 +1,74 @@
+---
+id: bfs-dfs-return-empty-results-in
+level: task
+title: "BFS/DFS return empty results in Python bindings"
+short_code: "GQLITE-T-0119"
+created_at: 2026-03-17T02:45:26.902672+00:00
+updated_at: 2026-03-17T02:51:50.036225+00:00
+parent: 
+blocked_by: []
+archived: false
+
+tags:
+  - "#task"
+  - "#bug"
+  - "#phase/completed"
+
+
+exit_criteria_met: false
+initiative_id: NULL
+---
+
+# BFS/DFS return empty results in Python bindings
+
+## Objective
+
+Fix `bfs()` and `dfs()` in Python bindings to return actual traversal results instead of empty lists.
+
+## Backlog Item Details
+
+### Type
+- [x] Bug - Production issue that needs fixing
+
+### Priority
+- [x] P1 - High (important for user experience)
+
+### Impact Assessment
+- **Affected Users**: All Python binding users calling BFS/DFS
+- **Reproduction Steps**: 
+  1. Create a graph with nodes and edges
+  2. Call `g.bfs("start_node")` or `g.dfs("start_node")`
+  3. Get empty list `[]` instead of traversal results
+- **Expected vs Actual**: Should return list of dicts with `user_id`, `depth`, `order`. Returns `[]`.
+
+## Root Cause
+
+The C extension returns algorithm results wrapped as `[{"column_0": [...array...]}]`. Other algorithm mixins (centrality, community, components) call `extract_algo_array()` to unwrap this `column_0` wrapper. The `TraversalMixin` in `bindings/python/src/graphqlite/algorithms/traversal.py` iterates over `result` rows directly looking for `user_id`, but the raw rows have a single `column_0` key containing the actual array. The data is silently dropped.
+
+## Acceptance Criteria
+
+## Acceptance Criteria
+
+## Acceptance Criteria
+
+## Acceptance Criteria
+
+- [ ] `bfs()` returns non-empty list with correct traversal results
+- [ ] `dfs()` returns non-empty list with correct traversal results
+- [ ] Results contain `user_id`, `depth`, `order` fields
+- [ ] `max_depth` parameter works correctly
+- [ ] Existing Python tests pass
+
+## Implementation Notes
+
+### Technical Approach
+Add `extract_algo_array()` call in `traversal.py` to unwrap `column_0`, matching the pattern used in centrality/community/components mixins.
+
+## Status Updates
+
+### Implementation Complete
+- **Fix**: Added `extract_algo_array()` and `parse_traversal_result()` calls in `traversal.py`
+- **Root cause**: `result` rows had `column_0` wrapping — needed unwrapping like other algo mixins
+- **Added**: `bfs()`, `dfs()`, `apsp()` to `ALGO_COLUMN_NAMES` in `_parsing.py`
+- **Verified**: BFS returns `[{user_id, depth, order}]` correctly, max_depth works
+- **Tests**: 226 Python tests pass
@@ -0,0 +1,70 @@
+---
+id: apsp-returns-empty-results-in
+level: task
+title: "APSP returns empty results in Python bindings"
+short_code: "GQLITE-T-0120"
+created_at: 2026-03-17T02:45:27.787638+00:00
+updated_at: 2026-03-17T02:54:10.922249+00:00
+parent: 
+blocked_by: []
+archived: false
+
+tags:
+  - "#task"
+  - "#bug"
+  - "#phase/completed"
+
+
+exit_criteria_met: false
+initiative_id: NULL
+---
+
+# APSP returns empty results in Python bindings
+
+## Objective
+
+Fix `all_pairs_shortest_path()` / `apsp()` in Python bindings to return actual results instead of empty list.
+
+## Backlog Item Details
+
+### Type
+- [x] Bug - Production issue that needs fixing
+
+### Priority
+- [x] P1 - High (important for user experience)
+
+### Impact Assessment
+- **Affected Users**: All Python binding users calling APSP
+- **Reproduction Steps**: 
+  1. Create a graph with nodes and edges
+  2. Call `g.apsp()`
+  3. Get empty list `[]` instead of path results
+- **Expected vs Actual**: Should return list of dicts with `source`, `target`, `distance`. Returns `[]`.
+
+## Root Cause
+
+Same as GQLITE-T-0119. The `all_pairs_shortest_path()` in `bindings/python/src/graphqlite/algorithms/paths.py` iterates `result` rows directly without calling `extract_algo_array()` to unwrap the `column_0` wrapper from the C extension.
+
+## Acceptance Criteria
+
+## Acceptance Criteria
+
+## Acceptance Criteria
+
+## Acceptance Criteria
+
+- [ ] `apsp()` returns non-empty list with correct path results
+- [ ] Results contain `source`, `target`, `distance` fields
+- [ ] Existing Python tests pass
+
+## Implementation Notes
+
+### Technical Approach
+Add `extract_algo_array()` call in `paths.py` for `all_pairs_shortest_path`, matching the pattern used in other algorithm mixins.
+
+## Status Updates
+
+### Implementation Complete
+- **Fix**: Added `extract_algo_array()` call in `paths.py` for `all_pairs_shortest_path`
+- **Verified**: APSP returns `[{source, target, distance}]` correctly
+- **Tests**: 226 Python tests pass
@@ -0,0 +1,74 @@
+---
+id: leading-zero-strings-coerced-to
+level: task
+title: "Leading-zero strings coerced to integers in Cypher path"
+short_code: "GQLITE-T-0121"
+created_at: 2026-03-17T02:45:28.751626+00:00
+updated_at: 2026-03-17T12:57:12.594512+00:00
+parent: 
+blocked_by: []
+archived: false
+
+tags:
+  - "#task"
+  - "#bug"
+  - "#phase/completed"
+
+
+exit_criteria_met: false
+initiative_id: NULL
+---
+
+# Leading-zero strings coerced to integers in Cypher path
+
+## Objective
+
+Fix string values with leading zeros (e.g., `"02134"`) being coerced to integers when passed through the Cypher path in `upsert_node`/`upsert_edge`.
+
+## Backlog Item Details
+
+### Type
+- [x] Bug - Production issue that needs fixing
+
+### Priority
+- [x] P2 - Medium (nice to have)
+
+### Impact Assessment
+- **Affected Users**: Anyone storing zip codes, phone numbers, or other zero-prefixed string identifiers
+- **Reproduction Steps**: 
+  1. `g.upsert_node("n1", {"zipcode": "02134"}, "Place")`
+  2. `g.get_node("n1")` — zipcode property is integer `2134`, not string `"02134"`
+- **Expected vs Actual**: `"02134"` should stay as string. Becomes integer `2134`.
+
+## Root Cause
+
+**Python path**: `format_props()` in `utils.py` calls `_format_value()` which detects numeric-looking strings and passes them through unquoted. The C extension then parses `02134` as integer.
+
+**Rust path** (now mitigated): The new `PropertyValue` type from GQLITE-T-0114 lets users explicitly pass `PropertyValue::Text("02134".into())`. But the `From<&str>` auto-detection still has this issue.
+
+**Fix needed in Python**: `_format_value()` should NOT strip leading zeros — if a string starts with `0` and is longer than 1 char, it should be treated as text, not a number.
+
+## Acceptance Criteria
+
+## Acceptance Criteria
+
+## Acceptance Criteria
+
+## Acceptance Criteria
+
+- [ ] `upsert_node` with `{"zipcode": "02134"}` preserves string type
+- [ ] Pure numeric strings like `"42"` still auto-detect as integer
+- [ ] `"0"` still auto-detects as integer (single zero is valid)
+- [ ] `"0.5"` still auto-detects as float
+- [ ] Rust `From<&str> for PropertyValue` also fixed for leading-zero strings
+- [ ] Tests added for edge cases
+
+## Status Updates
+
+### Implementation Complete
+- **Root cause found**: `create_property_agtype_value()` in `executor_match.c` used `strtoll()` to parse `"02134"` as integer `2134` — no leading-zero check
+- **C fix**: Added leading-zero check in `executor_match.c:create_property_agtype_value()` — strings starting with `0` followed by digits skip numeric parsing
+- **Rust fix**: Added `has_leading_zero()` check in `utils.rs` for `format_value()` and `From<&str> for PropertyValue`
+- **Python**: Already handled correctly — `format_props()` wraps all `str` values in quotes
+- **Verified**: `"02134"` now returns as `"02134"` string, `42` still integer, `0.5` still float, `"0"` still integer, comparisons still work
+- **Tests**: 849 unit, 226 Python, 213 Rust all pass
@@ -0,0 +1,69 @@
+---
+id: graph-cache-functions-not
+level: task
+title: "Graph cache functions not registered in debug extension build"
+short_code: "GQLITE-T-0122"
+created_at: 2026-03-17T02:45:29.424871+00:00
+updated_at: 2026-03-17T13:02:01.161106+00:00
+parent: 
+blocked_by: []
+archived: false
+
+tags:
+  - "#task"
+  - "#bug"
+  - "#phase/completed"
+
+
+exit_criteria_met: false
+initiative_id: NULL
+---
+
+# Graph cache functions not registered in debug extension build
+
+## Objective
+
+Register `gql_load_graph`, `gql_unload_graph`, `gql_reload_graph`, `gql_graph_loaded` SQL functions in debug extension builds so graph caching works outside release builds.
+
+## Backlog Item Details
+
+### Type
+- [x] Bug - Production issue that needs fixing
+
+### Priority
+- [x] P2 - Medium (nice to have)
+
+### Impact Assessment
+- **Affected Users**: All debug-build users (developers, CI)
+- **Reproduction Steps**: 
+  1. Build extension in debug mode: `angreal build extension`
+  2. Load extension and call `SELECT gql_load_graph()`
+  3. Error: `no such function: gql_load_graph`
+- **Expected vs Actual**: Cache functions should be available in all builds. Only available in release builds.
+
+## Acceptance Criteria
+
+## Acceptance Criteria
+
+## Acceptance Criteria
+
+## Acceptance Criteria
+
+- [ ] `gql_load_graph()` works in debug builds
+- [ ] `gql_unload_graph()` works in debug builds
+- [ ] `gql_reload_graph()` works in debug builds
+- [ ] `gql_graph_loaded()` works in debug builds
+- [ ] Python cache tests pass (currently skipped/failing)
+
+## Implementation Notes
+
+### Technical Approach
+Check the extension entry point (`extension.c` or equivalent) for `#ifdef` guards that gate the cache function registration. Remove or adjust the conditional compilation so these functions are always registered.
+
+## Status Updates
+
+### No Fix Needed
+- **Investigation result**: Cache functions (`gql_load_graph` etc.) ARE registered unconditionally in `extension.c` line 566-574 — no `#ifdef` guards
+- **Verified via sqlite3 CLI**: All 4 functions work in debug builds
+- **Verified via angreal test python**: All 8 cache tests pass (test_load_graph, test_load_graph_already_loaded, test_unload_graph, test_unload_graph_not_loaded, test_reload_graph, test_reload_graph_not_loaded, test_cache_with_pagerank, test_cache_empty_graph)
+- **Root cause of original failure**: The investigation's ad-hoc Python test script likely loaded the extension incorrectly (wrong path or missing extension loading)