Skip to content

Commit f2baa24

Browse files
authored
feat: address downstream API feedback and fix binding bugs (#30)
* feat: address downstream API feedback and fix binding bugs Implements 6 feature requests from downstream consumer (Clotho) and fixes 3 bugs discovered during deep investigation of Rust/Python binding parity. Parser & Transform: - Add bare undirected match syntax: (a)--(b), -->, <-- - Fix undirected match to query both edge directions (was forward-only) - Remove `-` from scanner operator regex so `--` tokenizes correctly - Fix leading-zero string coercion in create_property_agtype_value() Rust bindings: - Add PropertyValue enum for typed upsert (backward-compatible with &str) - Add Value::get(key), get_index(i), Index<&str> for ergonomic access - Add get_edges_from/to/by_type/node_edges edge query helpers - Add Graph::query_params() one-liner convenience method - Rename GraphStats fields: nodes->node_count, edges->edge_count Python bindings: - Add get_edges_from/to/by_type edge query helpers - Fix BFS/DFS returning empty (missing extract_algo_array unwrap) - Fix APSP returning empty (same column_0 unwrap bug) - Rename stats() keys to node_count/edge_count - Remap cache status keys for consistency All tests pass: 849 C unit, 226 Python, 213 Rust.
1 parent 43e8b56 commit f2baa24

39 files changed

+2597
-137
lines changed
Lines changed: 92 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,92 @@
1+
---
2+
id: segfault-on-parameterized-bfs-dfs
3+
level: task
4+
title: "Segfault on parameterized bfs/dfs traversals"
5+
short_code: "GQLITE-T-0112"
6+
created_at: 2026-03-03T02:12:25.655318+00:00
7+
updated_at: 2026-03-03T02:27:22.507223+00:00
8+
parent:
9+
blocked_by: []
10+
archived: false
11+
12+
tags:
13+
- "#task"
14+
- "#bug"
15+
- "#phase/active"
16+
17+
18+
exit_criteria_met: false
19+
strategy_id: NULL
20+
initiative_id: NULL
21+
---
22+
23+
# Segfault on parameterized bfs/dfs traversals
24+
25+
GitHub Issue: #27 (reported by @kynx)
26+
Version: 0.3.5, SQLite 3.51.2
27+
28+
## Objective
29+
30+
Fix segmentation fault when using parameterized `bfs()` / `dfs()` traversals on a populated graph.
31+
32+
## Bug Details
33+
34+
- **Priority**: P0 - segfault crashes the sqlite3 process
35+
- **Reproduction Steps**:
36+
1. `select cypher('RETURN bfs($a)', '{"a": "A"}');` — works on empty graph
37+
2. `select cypher('CREATE (a:Node {id: ''A''})');` — create a node
38+
3. `select cypher('RETURN bfs($a)', '{"a": "A"}');` — segfault
39+
- **Expected**: Returns traversal result with the created node
40+
- **Actual**: Segmentation fault (signal 11)
41+
42+
## Acceptance Criteria
43+
44+
## Acceptance Criteria
45+
46+
## Acceptance Criteria
47+
48+
- [ ] `RETURN bfs($param)` with parameters works on populated graphs
49+
- [ ] `RETURN dfs($param)` with parameters works on populated graphs
50+
- [ ] Functional test covering parameterized traversals added
51+
- [ ] No segfault under any combination of empty/populated graph + params
52+
53+
## Investigation Findings
54+
55+
**Root cause**: `detect_graph_algorithm()` (`graph_algorithms.c:477-504`) only handles `AST_NODE_LITERAL` args, ignoring `AST_NODE_PARAMETER`. When `$a` is passed, `source_id` stays NULL → passed to `execute_bfs()``strcmp(user_id, NULL)` segfaults at `graph_algo_traversal.c:149`.
56+
57+
Empty graphs don't crash because the node loop (n=0) never executes the strcmp.
58+
59+
**Crash path**: `query_dispatch.c:914``detect_graph_algorithm()``graph_algorithms.c:477` (literal-only check) → `query_dispatch.c:977` (NULL source_id) → `graph_algo_traversal.c:149` (strcmp with NULL)
60+
61+
**All affected functions**: bfs, dfs, dijkstra, astar, nodeSimilarity, knn — all have the same literal-only parameter extraction.
62+
63+
**Fix approach**: Resolve `AST_NODE_PARAMETER` nodes in `detect_graph_algorithm()` using the executor's `params_json` (infrastructure already exists via `get_param_value()` in `executor_helpers.c`).
64+
65+
## Implementation Plan
66+
67+
1. **NULL guards** in `graph_algo_traversal.c`: `execute_bfs()` and `execute_dfs()` return empty result if `start_id` is NULL (safety net)
68+
2. **Signature change**: Add `const char *params_json` to `detect_graph_algorithm()` in `graph_algorithms.h` and update call site in `query_dispatch.c:914`
69+
3. **Parameter resolution**: Add `resolve_string_arg()` helper in `graph_algorithms.c` that handles both `AST_NODE_LITERAL` and `AST_NODE_PARAMETER` via existing `get_param_value()`. Apply to all affected algos: bfs, dfs, dijkstra, astar, nodeSimilarity, knn
70+
4. **Tests**: `tests/functional/36_parameterized_algorithms.sql` (already written)
71+
72+
### Files to modify
73+
- `src/backend/executor/graph_algo_traversal.c` — NULL guards
74+
- `src/include/executor/graph_algorithms.h` — signature change
75+
- `src/backend/executor/query_dispatch.c` — pass `executor->params_json`
76+
- `src/backend/executor/graph_algorithms.c` — helper + all extraction sites
77+
78+
## Status Updates
79+
80+
### Implementation Complete
81+
All 4 changes implemented:
82+
83+
1. **NULL guards**`execute_bfs()` and `execute_dfs()` now return `[]` if `start_id` is NULL
84+
2. **Signature change**`detect_graph_algorithm()` now accepts `const char *params_json`; call site in `query_dispatch.c` updated to pass `executor->params_json`
85+
3. **Parameter resolution** — Added `resolve_string_arg()` static helper that handles both `AST_NODE_LITERAL` and `AST_NODE_PARAMETER` via `get_param_value()`. Applied to all 6 affected algorithms: bfs, dfs, dijkstra, astar, nodeSimilarity, knn
86+
4. **Include** — Added `#include "executor/executor_internal.h"` for `get_param_value()` and `property_type`
87+
88+
### Verification
89+
- `angreal build extension` — builds clean, no warnings
90+
- `tests/functional/36_parameterized_algorithms.sql` — all 20 tests pass, no segfault
91+
- `angreal test functional` — all existing functional tests pass
92+
- `angreal test unit` — all 770+ unit tests pass
Lines changed: 74 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,74 @@
1+
---
2+
id: bfs-dfs-return-empty-results-in
3+
level: task
4+
title: "BFS/DFS return empty results in Python bindings"
5+
short_code: "GQLITE-T-0119"
6+
created_at: 2026-03-17T02:45:26.902672+00:00
7+
updated_at: 2026-03-17T02:51:50.036225+00:00
8+
parent:
9+
blocked_by: []
10+
archived: false
11+
12+
tags:
13+
- "#task"
14+
- "#bug"
15+
- "#phase/completed"
16+
17+
18+
exit_criteria_met: false
19+
initiative_id: NULL
20+
---
21+
22+
# BFS/DFS return empty results in Python bindings
23+
24+
## Objective
25+
26+
Fix `bfs()` and `dfs()` in Python bindings to return actual traversal results instead of empty lists.
27+
28+
## Backlog Item Details
29+
30+
### Type
31+
- [x] Bug - Production issue that needs fixing
32+
33+
### Priority
34+
- [x] P1 - High (important for user experience)
35+
36+
### Impact Assessment
37+
- **Affected Users**: All Python binding users calling BFS/DFS
38+
- **Reproduction Steps**:
39+
1. Create a graph with nodes and edges
40+
2. Call `g.bfs("start_node")` or `g.dfs("start_node")`
41+
3. Get empty list `[]` instead of traversal results
42+
- **Expected vs Actual**: Should return list of dicts with `user_id`, `depth`, `order`. Returns `[]`.
43+
44+
## Root Cause
45+
46+
The C extension returns algorithm results wrapped as `[{"column_0": [...array...]}]`. Other algorithm mixins (centrality, community, components) call `extract_algo_array()` to unwrap this `column_0` wrapper. The `TraversalMixin` in `bindings/python/src/graphqlite/algorithms/traversal.py` iterates over `result` rows directly looking for `user_id`, but the raw rows have a single `column_0` key containing the actual array. The data is silently dropped.
47+
48+
## Acceptance Criteria
49+
50+
## Acceptance Criteria
51+
52+
## Acceptance Criteria
53+
54+
## Acceptance Criteria
55+
56+
- [ ] `bfs()` returns non-empty list with correct traversal results
57+
- [ ] `dfs()` returns non-empty list with correct traversal results
58+
- [ ] Results contain `user_id`, `depth`, `order` fields
59+
- [ ] `max_depth` parameter works correctly
60+
- [ ] Existing Python tests pass
61+
62+
## Implementation Notes
63+
64+
### Technical Approach
65+
Add `extract_algo_array()` call in `traversal.py` to unwrap `column_0`, matching the pattern used in centrality/community/components mixins.
66+
67+
## Status Updates
68+
69+
### Implementation Complete
70+
- **Fix**: Added `extract_algo_array()` and `parse_traversal_result()` calls in `traversal.py`
71+
- **Root cause**: `result` rows had `column_0` wrapping — needed unwrapping like other algo mixins
72+
- **Added**: `bfs()`, `dfs()`, `apsp()` to `ALGO_COLUMN_NAMES` in `_parsing.py`
73+
- **Verified**: BFS returns `[{user_id, depth, order}]` correctly, max_depth works
74+
- **Tests**: 226 Python tests pass
Lines changed: 70 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,70 @@
1+
---
2+
id: apsp-returns-empty-results-in
3+
level: task
4+
title: "APSP returns empty results in Python bindings"
5+
short_code: "GQLITE-T-0120"
6+
created_at: 2026-03-17T02:45:27.787638+00:00
7+
updated_at: 2026-03-17T02:54:10.922249+00:00
8+
parent:
9+
blocked_by: []
10+
archived: false
11+
12+
tags:
13+
- "#task"
14+
- "#bug"
15+
- "#phase/completed"
16+
17+
18+
exit_criteria_met: false
19+
initiative_id: NULL
20+
---
21+
22+
# APSP returns empty results in Python bindings
23+
24+
## Objective
25+
26+
Fix `all_pairs_shortest_path()` / `apsp()` in Python bindings to return actual results instead of empty list.
27+
28+
## Backlog Item Details
29+
30+
### Type
31+
- [x] Bug - Production issue that needs fixing
32+
33+
### Priority
34+
- [x] P1 - High (important for user experience)
35+
36+
### Impact Assessment
37+
- **Affected Users**: All Python binding users calling APSP
38+
- **Reproduction Steps**:
39+
1. Create a graph with nodes and edges
40+
2. Call `g.apsp()`
41+
3. Get empty list `[]` instead of path results
42+
- **Expected vs Actual**: Should return list of dicts with `source`, `target`, `distance`. Returns `[]`.
43+
44+
## Root Cause
45+
46+
Same as GQLITE-T-0119. The `all_pairs_shortest_path()` in `bindings/python/src/graphqlite/algorithms/paths.py` iterates `result` rows directly without calling `extract_algo_array()` to unwrap the `column_0` wrapper from the C extension.
47+
48+
## Acceptance Criteria
49+
50+
## Acceptance Criteria
51+
52+
## Acceptance Criteria
53+
54+
## Acceptance Criteria
55+
56+
- [ ] `apsp()` returns non-empty list with correct path results
57+
- [ ] Results contain `source`, `target`, `distance` fields
58+
- [ ] Existing Python tests pass
59+
60+
## Implementation Notes
61+
62+
### Technical Approach
63+
Add `extract_algo_array()` call in `paths.py` for `all_pairs_shortest_path`, matching the pattern used in other algorithm mixins.
64+
65+
## Status Updates
66+
67+
### Implementation Complete
68+
- **Fix**: Added `extract_algo_array()` call in `paths.py` for `all_pairs_shortest_path`
69+
- **Verified**: APSP returns `[{source, target, distance}]` correctly
70+
- **Tests**: 226 Python tests pass
Lines changed: 74 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,74 @@
1+
---
2+
id: leading-zero-strings-coerced-to
3+
level: task
4+
title: "Leading-zero strings coerced to integers in Cypher path"
5+
short_code: "GQLITE-T-0121"
6+
created_at: 2026-03-17T02:45:28.751626+00:00
7+
updated_at: 2026-03-17T12:57:12.594512+00:00
8+
parent:
9+
blocked_by: []
10+
archived: false
11+
12+
tags:
13+
- "#task"
14+
- "#bug"
15+
- "#phase/completed"
16+
17+
18+
exit_criteria_met: false
19+
initiative_id: NULL
20+
---
21+
22+
# Leading-zero strings coerced to integers in Cypher path
23+
24+
## Objective
25+
26+
Fix string values with leading zeros (e.g., `"02134"`) being coerced to integers when passed through the Cypher path in `upsert_node`/`upsert_edge`.
27+
28+
## Backlog Item Details
29+
30+
### Type
31+
- [x] Bug - Production issue that needs fixing
32+
33+
### Priority
34+
- [x] P2 - Medium (nice to have)
35+
36+
### Impact Assessment
37+
- **Affected Users**: Anyone storing zip codes, phone numbers, or other zero-prefixed string identifiers
38+
- **Reproduction Steps**:
39+
1. `g.upsert_node("n1", {"zipcode": "02134"}, "Place")`
40+
2. `g.get_node("n1")` — zipcode property is integer `2134`, not string `"02134"`
41+
- **Expected vs Actual**: `"02134"` should stay as string. Becomes integer `2134`.
42+
43+
## Root Cause
44+
45+
**Python path**: `format_props()` in `utils.py` calls `_format_value()` which detects numeric-looking strings and passes them through unquoted. The C extension then parses `02134` as integer.
46+
47+
**Rust path** (now mitigated): The new `PropertyValue` type from GQLITE-T-0114 lets users explicitly pass `PropertyValue::Text("02134".into())`. But the `From<&str>` auto-detection still has this issue.
48+
49+
**Fix needed in Python**: `_format_value()` should NOT strip leading zeros — if a string starts with `0` and is longer than 1 char, it should be treated as text, not a number.
50+
51+
## Acceptance Criteria
52+
53+
## Acceptance Criteria
54+
55+
## Acceptance Criteria
56+
57+
## Acceptance Criteria
58+
59+
- [ ] `upsert_node` with `{"zipcode": "02134"}` preserves string type
60+
- [ ] Pure numeric strings like `"42"` still auto-detect as integer
61+
- [ ] `"0"` still auto-detects as integer (single zero is valid)
62+
- [ ] `"0.5"` still auto-detects as float
63+
- [ ] Rust `From<&str> for PropertyValue` also fixed for leading-zero strings
64+
- [ ] Tests added for edge cases
65+
66+
## Status Updates
67+
68+
### Implementation Complete
69+
- **Root cause found**: `create_property_agtype_value()` in `executor_match.c` used `strtoll()` to parse `"02134"` as integer `2134` — no leading-zero check
70+
- **C fix**: Added leading-zero check in `executor_match.c:create_property_agtype_value()` — strings starting with `0` followed by digits skip numeric parsing
71+
- **Rust fix**: Added `has_leading_zero()` check in `utils.rs` for `format_value()` and `From<&str> for PropertyValue`
72+
- **Python**: Already handled correctly — `format_props()` wraps all `str` values in quotes
73+
- **Verified**: `"02134"` now returns as `"02134"` string, `42` still integer, `0.5` still float, `"0"` still integer, comparisons still work
74+
- **Tests**: 849 unit, 226 Python, 213 Rust all pass
Lines changed: 69 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,69 @@
1+
---
2+
id: graph-cache-functions-not
3+
level: task
4+
title: "Graph cache functions not registered in debug extension build"
5+
short_code: "GQLITE-T-0122"
6+
created_at: 2026-03-17T02:45:29.424871+00:00
7+
updated_at: 2026-03-17T13:02:01.161106+00:00
8+
parent:
9+
blocked_by: []
10+
archived: false
11+
12+
tags:
13+
- "#task"
14+
- "#bug"
15+
- "#phase/completed"
16+
17+
18+
exit_criteria_met: false
19+
initiative_id: NULL
20+
---
21+
22+
# Graph cache functions not registered in debug extension build
23+
24+
## Objective
25+
26+
Register `gql_load_graph`, `gql_unload_graph`, `gql_reload_graph`, `gql_graph_loaded` SQL functions in debug extension builds so graph caching works outside release builds.
27+
28+
## Backlog Item Details
29+
30+
### Type
31+
- [x] Bug - Production issue that needs fixing
32+
33+
### Priority
34+
- [x] P2 - Medium (nice to have)
35+
36+
### Impact Assessment
37+
- **Affected Users**: All debug-build users (developers, CI)
38+
- **Reproduction Steps**:
39+
1. Build extension in debug mode: `angreal build extension`
40+
2. Load extension and call `SELECT gql_load_graph()`
41+
3. Error: `no such function: gql_load_graph`
42+
- **Expected vs Actual**: Cache functions should be available in all builds. Only available in release builds.
43+
44+
## Acceptance Criteria
45+
46+
## Acceptance Criteria
47+
48+
## Acceptance Criteria
49+
50+
## Acceptance Criteria
51+
52+
- [ ] `gql_load_graph()` works in debug builds
53+
- [ ] `gql_unload_graph()` works in debug builds
54+
- [ ] `gql_reload_graph()` works in debug builds
55+
- [ ] `gql_graph_loaded()` works in debug builds
56+
- [ ] Python cache tests pass (currently skipped/failing)
57+
58+
## Implementation Notes
59+
60+
### Technical Approach
61+
Check the extension entry point (`extension.c` or equivalent) for `#ifdef` guards that gate the cache function registration. Remove or adjust the conditional compilation so these functions are always registered.
62+
63+
## Status Updates
64+
65+
### No Fix Needed
66+
- **Investigation result**: Cache functions (`gql_load_graph` etc.) ARE registered unconditionally in `extension.c` line 566-574 — no `#ifdef` guards
67+
- **Verified via sqlite3 CLI**: All 4 functions work in debug builds
68+
- **Verified via angreal test python**: All 8 cache tests pass (test_load_graph, test_load_graph_already_loaded, test_unload_graph, test_unload_graph_not_loaded, test_reload_graph, test_reload_graph_not_loaded, test_cache_with_pagerank, test_cache_empty_graph)
69+
- **Root cause of original failure**: The investigation's ad-hoc Python test script likely loaded the extension incorrectly (wrong path or missing extension loading)

0 commit comments

Comments
 (0)