Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
51 commits
Select commit Hold shift + click to select a range
d37ca93
feat: scaffold cudf executor skeleton
lmeyerov Nov 19, 2025
051a936
feat: wire same-path plan into cudf executor
lmeyerov Nov 19, 2025
7ade745
feat: add gfql where metadata and planner
lmeyerov Nov 19, 2025
ce75193
feat: implement cudf executor forward pass
lmeyerov Nov 19, 2025
896c08a
test: add cudf forward parity cases
lmeyerov Nov 19, 2025
770103d
docs: copy issue 837 plan into impl folder
lmeyerov Nov 19, 2025
35224b2
chore: remove tracked cudf executor plan
lmeyerov Nov 20, 2025
1f4d18c
feat: add oracle fallback for cudf same-path executor
lmeyerov Nov 20, 2025
fa49df7
chore: gate cudf same-path executor and add strict-mode test
lmeyerov Nov 20, 2025
da10935
chore: document cuDF same-path fallback gating
lmeyerov Nov 20, 2025
31a67d5
feat: add same-path pruning for cudf executor
lmeyerov Nov 20, 2025
ff701e8
feat: route cudf chains with WHERE to same-path executor
lmeyerov Nov 20, 2025
e787bde
feat: enforce same-path summaries in cudf executor
lmeyerov Nov 20, 2025
94a63ba
fix(gfql): preserve edge filters in cudf same-path
lmeyerov Nov 22, 2025
352a418
chore(gfql): fix same-path typing and mypy config
lmeyerov Nov 22, 2025
297d4ff
chore(gfql): clean chain typing imports
lmeyerov Nov 22, 2025
22c3276
chore(gfql): silence dtype comparisons for mypy 3.8
lmeyerov Nov 22, 2025
59f2909
test(gfql): cover same-path cycles, branches, edge filters, cudf
lmeyerov Nov 23, 2025
0ae2c1d
test(gfql): compress same-path topology coverage
lmeyerov Nov 23, 2025
d2cdbfb
chore(gfql): tighten inequality mask
lmeyerov Nov 23, 2025
27d48fe
test(gfql): add dispatch same-path dict case
lmeyerov Nov 23, 2025
4174258
test(gfql): add chain/list dispatch same-path parity
lmeyerov Nov 23, 2025
2976cff
fix(gfql): import same_path_types from gfql
lmeyerov Dec 24, 2025
7d40694
fix(gfql): add package init and clean mypy config
lmeyerov Dec 24, 2025
564edf1
fix(gfql): add ref package init
lmeyerov Dec 24, 2025
ed24318
fix: align same-path hop slicing with oracle
lmeyerov Dec 24, 2025
ba5be94
test(gfql): add 8 feature composition tests for hop ranges + WHERE
lmeyerov Dec 26, 2025
198ad04
fix(gfql): support WHERE clauses for multi-hop edges in same-path exe…
lmeyerov Dec 26, 2025
7b9c327
refactor(gfql): rename CuDFSamePathExecutor to DFSamePathExecutor
lmeyerov Dec 26, 2025
cd57936
fix(gfql): comprehensive WHERE + multi-hop bug fixes and test amplifi…
lmeyerov Dec 27, 2025
d04dc3c
refactor(gfql): vectorize df_executor for GPU compatibility
lmeyerov Dec 28, 2025
b3de2a5
test(gfql): add df_executor profiling script
lmeyerov Dec 28, 2025
54d5d0a
test(gfql): add cProfile analysis and extended profiling
lmeyerov Dec 28, 2025
71cda41
fix(gfql): multiple bug fixes for native vectorized path
lmeyerov Dec 28, 2025
b6b5449
fix(gfql): resolve flake8 lint errors (F841, W504)
lmeyerov Dec 28, 2025
520eaa2
docs(plan): add Session 9 summary for CI fixes and verification update
lmeyerov Dec 28, 2025
df156d4
chore: remove plan.md from repo
lmeyerov Dec 28, 2025
6bc8a46
fix(gfql): resolve mypy type errors
lmeyerov Dec 28, 2025
baefa76
fix(gfql): correct mypy ignore codes for iterrows
lmeyerov Dec 28, 2025
3d0fe0c
fix(gfql): use pd.Index for column assignment to satisfy py38 mypy
lmeyerov Dec 28, 2025
9d48714
chore(gfql): add initial alloy f/b/f where model
lmeyerov Nov 23, 2025
ec2d516
chore(gfql): refine alloy model where lowering
lmeyerov Nov 23, 2025
00f082b
ci(alloy): add scenario checks and coverage
lmeyerov Nov 25, 2025
064df32
ci(alloy): add optional multi-chain full-scope
lmeyerov Nov 25, 2025
4ab7310
ci(alloy): pull/push ghcr cache for checks
lmeyerov Nov 26, 2025
7a291f1
ci(alloy): gate full scopes and document mapping
lmeyerov Nov 26, 2025
baf9d76
docs(alloy): add README and mapping notes
lmeyerov Nov 26, 2025
8332b30
docs(alloy): note hop range modeling limits
lmeyerov Dec 24, 2025
e1a7cec
docs(alloy): add scope/limitations section and feature composition plan
lmeyerov Dec 26, 2025
a4f0968
feat(alloy): add contradictory WHERE scenario and document bug findings
lmeyerov Dec 29, 2025
949bd70
docs(alloy): document contradictory WHERE limitation
lmeyerov Dec 29, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
39 changes: 39 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@ jobs:
docs: ${{ steps.filter.outputs.docs }}
infra: ${{ steps.filter.outputs.infra }}
docs_only_latest: ${{ steps.docs_only_latest.outputs.docs_only_latest }}
alloy: ${{ steps.filter.outputs.alloy }}
steps:
- uses: actions/checkout@v3
- uses: dorny/paths-filter@v3
Expand Down Expand Up @@ -58,6 +59,8 @@ jobs:
- '**.rst'
- 'demos/**'
- 'notebooks/**'
alloy:
- 'alloy/**'

- name: Detect docs-only change on tip
id: docs_only_latest
Expand Down Expand Up @@ -123,6 +126,42 @@ jobs:
source pygraphistry/bin/activate
./bin/typecheck.sh

alloy-check:
needs: changes
if: ${{ needs.changes.outputs.alloy == 'true' || needs.changes.outputs.python == 'true' || needs.changes.outputs.infra == 'true' || github.event_name == 'workflow_dispatch' || github.event_name == 'schedule' }}
runs-on: ubuntu-latest
timeout-minutes: 10

steps:
- name: Checkout repo
uses: actions/checkout@v3
with:
lfs: true

- name: Login to GitHub Container Registry
uses: docker/login-action@v3
with:
registry: ghcr.io
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}

- name: Pre-pull Alloy image cache
run: |
docker pull ghcr.io/graphistry/alloy6:6.2.0 || true

- name: Run Alloy checks (scoped on PR/push, full on schedule/dispatch)
env:
EVENT_NAME: ${{ github.event_name }}
run: |
if [[ "$EVENT_NAME" == "schedule" || "$EVENT_NAME" == "workflow_dispatch" ]]; then
FULL=1
MULTI=1
else
FULL=0
MULTI=0
fi
ALLOY_PUSH=1 FULL=$FULL MULTI=$MULTI bash alloy/check_fbf_where.sh

test-minimal-python:
needs: [changes, python-lint-types]
# Run if Python files changed OR infrastructure changed OR manual/scheduled run
Expand Down
9 changes: 9 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,13 +12,20 @@ This project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.htm
- **Compute / hop**: `hop()` supports `min_hops`/`max_hops` traversal bounds plus optional hop labels for nodes, edges, and seeds, and post-traversal slicing via `output_min_hops`/`output_max_hops` to keep outputs compact while traversing wider ranges.
- **Docs / hop**: Added bounded-hop walkthrough notebook (`docs/source/gfql/hop_bounds.ipynb`), cheatsheet and GFQL spec updates, and examples showing how to combine hop ranges, labels, and output slicing.
- **GFQL / reference**: Extended the pandas reference enumerator and parity tests to cover hop ranges, labeling, and slicing so GFQL correctness checks include the new traversal shapes.
- **GFQL / Oracle**: Introduced `graphistry.gfql.ref.enumerator`, a pandas-only reference implementation that enumerates fixed-length chains, enforces local + same-path predicates, applies strict null semantics, enforces safety caps, and emits alias tags/optional path bindings for use as a correctness oracle.
- **GFQL / cuDF same-path**: Added execution-mode gate `GRAPHISTRY_CUDF_SAME_PATH_MODE` (auto/oracle/strict) for GFQL cuDF same-path executor. Auto falls back to oracle when GPU unavailable; strict requires cuDF or raises. Oracle path retains safety caps and alias-tag propagation.
- **GFQL / cuDF executor**: Implemented same-path pruning path (wavefront backward filtering, min/max summaries for inequalities, value-aware equality filters) with oracle fallback. CUDF chains with WHERE now dispatch through the same-path executor.

### Fixed
- **Compute / hop**: Exact-hop traversals now prune branches that do not reach `min_hops`, avoid reapplying min-hop pruning in reverse passes, keep seeds in wavefront outputs, and reuse forward wavefronts when recomputing labels so edge/node hop labels stay aligned (fixes 3-hop branch inclusion issues and mislabeled slices).

### Tests
- **GFQL / hop**: Expanded `test_compute_hops.py` and GFQL parity suites to assert branch pruning, bounded outputs, label collision handling, and forward/reverse slice behavior.
- **Reference enumerator**: Added oracle parity tests for hop ranges and output slices to guard GFQL integrations.
- **GFQL**: Added deterministic + property-based oracle tests (triangles, alias reuse, cuDF conversions, Hypothesis) plus parity checks ensuring pandas GFQL chains match the oracle outputs.
- **GFQL / cuDF same-path**: Added strict/auto mode coverage for cuDF executor fallback behavior to keep CI stable while GPU kernels are wired up.
- **GFQL / cuDF same-path**: Added GPU-path parity tests (equality/inequality) over CPU data to guard semantics while GPU CI remains unavailable.
- **Layouts**: Added comprehensive test coverage for `circle_layout()` and `group_in_a_box_layout()` with partition support (CPU/GPU)

### Infra
- **Tooling**: `bin/flake8.sh` / `bin/mypy.sh` now require installed tools (no auto-install), honor `FLAKE8_CMD` / `MYPY_CMD` and optional `MYPY_EXTRA_ARGS`; `bin/lint.sh` / `bin/typecheck.sh` resolve via uvx → python -m → bare.
Expand Down Expand Up @@ -107,6 +114,8 @@ This project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.htm
### Tests
- **CI / Python**: Expand GitHub Actions coverage to Python 3.13 + 3.13/3.14 for CPU lint/type/test jobs, while pinning RAPIDS-dependent CPU/GPU suites to <=3.13 until NVIDIA publishes 3.14 wheels (ensures lint/mypy/pytest signal on the latest interpreter without breaking RAPIDS installs).
- **GFQL**: Added deterministic + property-based oracle tests (triangles, alias reuse, cuDF conversions, Hypothesis) plus parity checks ensuring pandas GFQL chains match the oracle outputs.
- **GFQL / cuDF same-path**: Added strict/auto mode coverage for cuDF executor fallback behavior to keep CI stable while GPU kernels are wired up.
- **GFQL / cuDF same-path**: Added GPU-path parity tests (equality/inequality) over CPU data to guard semantics while GPU CI remains unavailable.
- **Layouts**: Added comprehensive test coverage for `circle_layout()` and `group_in_a_box_layout()` with partition support (CPU/GPU)

### Infra
Expand Down
274 changes: 274 additions & 0 deletions PLAN-846-852-feature-composition.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,274 @@
# Feature Composition Testing Plan: PR #846 + #852

## Status Summary

| Item | Status | Notes |
|------|--------|-------|
| P0/P1 Tests for #846 | ✅ DONE | 8 tests added; 6 xfail (bugs found), 2 passing |
| Multi-hop bugs filed | ✅ DONE | Issue #872 created |
| Alloy README update | ✅ DONE | Scope/limitations documented |
| Meta-issue roadmap | ✅ DONE | Issue #871 created |

## Issues Created

- **#871**: Meta: GFQL Testing & Verification Roadmap
- **#872**: Fix multi-hop + WHERE backward prune bugs in cuDF executor

## Branch Structure

```
master (includes PR #851 hop ranges - MERGED)
└── PR #846: feat/issue-837-cudf-hop-executor (same-path executor)
└── PR #852: feat/issue-838-alloy-fbf-where (alloy proof) ← CURRENT
```

## Execution Order

### Phase 1: PR #846 Tests (on branch `feat/issue-837-cudf-hop-executor`)

**Status: ✅ COMPLETE**

Tests added to `tests/gfql/ref/test_cudf_executor_inputs.py`:

| # | Test | Status | Notes |
|---|------|--------|-------|
| 1 | WHERE respected after min_hops backtracking | xfail | Bug #872 |
| 2 | Reverse direction + hop range + WHERE | xfail | Bug #872 |
| 3 | Non-adjacent alias WHERE | xfail | Bug #872 |
| 4 | Oracle vs cuDF parity comprehensive | xfail | Bug #872 |
| 5 | Multi-hop edge WHERE filtering | xfail | Bug #872 |
| 6 | Output slicing + WHERE | ✅ PASS | Works correctly |
| 7 | label_seeds + output_min_hops | ✅ PASS | Works correctly |
| 8 | Multiple WHERE + mixed hop ranges | xfail | Bug #872 |

**Key Finding**: The cuDF executor has architectural limitations with multi-hop edges + WHERE:
- Backward prune doesn't trace through intermediate edges
- `_is_single_hop()` gates WHERE filtering
- Non-adjacent alias WHERE not applied

These are documented in issue #872 for future fix.

---

### Phase 2: Rebase PR #852 onto master

```bash
git checkout feat/issue-838-alloy-fbf-where
git fetch origin
git rebase origin/master
# Resolve any conflicts
git push origin feat/issue-838-alloy-fbf-where --force-with-lease
```

---

### Phase 3: PR #852 Verification Updates (on branch `feat/issue-838-alloy-fbf-where`)

**Status: ✅ COMPLETE**

| # | Change | File | Status |
|---|--------|------|--------|
| 1 | Clarify hop ranges NOT formally verified | `alloy/README.md` | ✅ DONE |
| 2 | Note reliance on Python parity tests | `alloy/README.md` | ✅ DONE |
| 3 | State verified fragment precisely | `alloy/README.md` | ✅ DONE |

**P1 - Add scenario checks (optional, strengthens claims)** - Deferred to future work.

**Next steps:**
```bash
git checkout feat/issue-837-cudf-hop-executor
git stash pop # Apply the test changes
git add -A && git commit
git push origin feat/issue-837-cudf-hop-executor
# Wait for CI green, then merge PR #846 to master
```

---

## Test Implementation Details

### Test 1: WHERE after min_hops backtracking

```python
def test_where_respected_after_backtracking():
"""
Graph: a -> b -> c -> d (3 hops)
a -> x -> y (2 hops, dead end for min_hops=3)

WHERE: a.value < d.value

Backtracking for min_hops=3 should:
1. Prune x,y branch (doesn't reach 3 hops)
2. Keep a,b,c,d path
3. THEN apply WHERE to filter paths where a.value < d.value

If WHERE not re-applied after backtracking, invalid paths may remain.
"""
```

### Test 2: Reverse direction + WHERE

```python
def test_reverse_direction_where_semantics():
"""
Graph: a -> b -> c -> d (forward edges)

Chain: [n(name='start'), e_reverse(min_hops=2), n(name='end')]
WHERE: start.value > end.value

Starting at 'd', reverse traversal reaches:
- c at hop 1, b at hop 2, a at hop 3

With min_hops=2, valid endpoints are b (hop 2) and a (hop 3).
WHERE compares start (d) vs end (b or a).

Verify WHERE semantics are consistent regardless of traversal direction.
"""
```

### Test 3: Non-adjacent alias WHERE

```python
def test_non_adjacent_alias_where():
"""
Chain: [n(name='a'), e_forward(), n(name='b'), e_forward(), n(name='c')]
WHERE: a.id == c.id (aliases 2 edges apart)

This WHERE clause should filter to paths where the first and last
nodes have the same id (e.g., cycles back to start).

Risk: cuDF backward prune only applies WHERE to adjacent aliases.
"""
```

### Test 4: Oracle vs cuDF parity (parametrized)

```python
@pytest.mark.parametrize("scenario", COMPOSITION_SCENARIOS)
def test_oracle_cudf_parity(scenario):
"""
Run same query with Oracle and cuDF executor.
Verify identical results.

Scenarios cover all combinations of:
- Directions: forward, reverse, undirected
- Hop ranges: min_hops, max_hops, output slicing
- WHERE operators: ==, !=, <, <=, >, >=
- Topologies: linear, branch, cycle, disconnected
"""
```

---

## README Update for PR #852

```markdown
## Scope and Limitations

### What IS Formally Verified

- WHERE clause lowering to per-alias value summaries
- Equality (==, !=) via bitset filtering
- Inequality (<, <=, >, >=) via min/max summaries
- Multi-step chains with cross-alias comparisons
- Graph topologies: fan-out, fan-in, cycles, parallel edges, disconnected

### What is NOT Formally Verified

- **Hop ranges** (`min_hops`, `max_hops`): Approximated by unrolling to fixed-length chains
- **Output slicing** (`output_min_hops`, `output_max_hops`): Treated as post-filter
- **Hop labeling** (`label_node_hops`, `label_edge_hops`, `label_seeds`): Not modeled
- **Null/NaN semantics**: Verified in Python tests

### Test Coverage for Unverified Features

Hop ranges and output slicing are covered by Python parity tests:
- `tests/gfql/ref/test_enumerator_parity.py`: 11+ hop range scenarios
- `tests/gfql/ref/test_cudf_executor_inputs.py`: 8+ WHERE + hop range scenarios

These tests verify the cuDF executor matches the reference oracle implementation.
```

---

## Priority Summary

| Priority | Branch | Items | Blocks |
|----------|--------|-------|--------|
| **P0** | #846 | 4 tests | Merge of #846 |
| **P1** | #846 | 4 tests | - |
| **P0** | #852 | README scope update | Merge of #852 |
| **P1** | #852 | Alloy scenario checks | - |

---

## Success Criteria

### PR #846 Ready to Merge When:
- [ ] All 8 new tests pass
- [ ] Existing tests still pass
- [ ] CI green

### PR #852 Ready to Merge When:
- [ ] README accurately describes verified scope
- [ ] Alloy checks pass (existing + any new scenarios)
- [ ] CI green

---

## Resume Context

### Current State (as of session end)
- **Current branch**: `feat/issue-838-alloy-fbf-where` (PR #852)
- **Stash**: Test changes stashed on `feat/issue-837-cudf-hop-executor` (stash@{0})
- **Uncommitted**: `alloy/README.md` changes (scope/limitations section added)

### Git State Summary
```
feat/issue-838-alloy-fbf-where:
- Modified: alloy/README.md (scope/limitations section)
- Untracked: PLAN-846-852-feature-composition.md (this file)

feat/issue-837-cudf-hop-executor (stash@{0}):
- 8 new tests in tests/gfql/ref/test_cudf_executor_inputs.py
- TestP0FeatureComposition class (4 tests, 3 xfail + 1 passing)
- TestP1FeatureComposition class (4 tests, 3 xfail + 1 passing)
```

### Key Files Modified
1. `tests/gfql/ref/test_cudf_executor_inputs.py` - Added 8 feature composition tests
2. `alloy/README.md` - Added scope/limitations section
3. `PLAN-846-852-feature-composition.md` - This tracking document

### Bug Details (Issue #872)
Root cause in `graphistry/compute/gfql/cudf_executor.py`:
- `_backward_prune()` lines 312-393: Assumes single-hop edges
- `_is_single_hop()` gates WHERE filtering
- Multi-hop edges break backward prune path tracing

### To Resume Work
```bash
# 1. Commit alloy README changes on current branch
git add alloy/README.md
git commit -m "docs(alloy): add scope and limitations section"
git push origin feat/issue-838-alloy-fbf-where

# 2. Switch to #846 branch and apply stashed tests
git checkout feat/issue-837-cudf-hop-executor
git stash pop

# 3. Commit and push test changes
git add tests/gfql/ref/test_cudf_executor_inputs.py
git commit -m "test(gfql): add 8 feature composition tests for hop ranges + WHERE

Adds P0/P1 tests for PR #846 same-path executor with hop ranges.
6 tests xfail documenting known bugs (see issue #872).
2 tests pass verifying output slicing and label_seeds work correctly."
git push origin feat/issue-837-cudf-hop-executor

# 4. Wait for CI, then merge PRs in order: #846 first, then rebase/merge #852
```

### Related Issues
- **#871**: Meta: GFQL Testing & Verification Roadmap (future work)
- **#872**: Fix multi-hop + WHERE backward prune bugs in cuDF executor
7 changes: 7 additions & 0 deletions alloy/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
FROM eclipse-temurin:17-jre
WORKDIR /work

# Use published Alloy dist jar (6.2.0)
ADD https://github.com/AlloyTools/org.alloytools.alloy/releases/download/v6.2.0/org.alloytools.alloy.dist.jar /opt/alloy/alloy.jar

ENTRYPOINT ["java", "-jar", "/opt/alloy/alloy.jar"]
Loading