graphistry · lmeyerov · Jan 9, 2026 · Jan 9, 2026 · Jan 9, 2026 · Jan 9, 2026
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -8,7 +8,38 @@ This project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.htm
 ## [Development]
 <!-- Do Not Erase This Section - Used for tracking unreleased changes -->
 
+### Added
+- **GFQL / WHERE** (experimental): Added `Chain.where` field for same-path WHERE clause constraints. New modules: `same_path_types.py`, `same_path_plan.py`, `df_executor.py` implementing Yannakakis-style semijoin reduction for efficient WHERE filtering. Supports equality, inequality, and comparison operators on named alias columns.
+- **GFQL / cuDF same-path**: Added execution-mode gate `GRAPHISTRY_CUDF_SAME_PATH_MODE` (auto/oracle/strict) for GFQL cuDF same-path executor. Auto falls back to oracle when GPU unavailable; strict requires cuDF or raises.
+- **Compute / hop**: Added `GRAPHISTRY_HOP_FAST_PATH` (set to `0`/`false`/`off`) to disable fast-path traversal for benchmarking or compatibility checks.
+- **GFQL / WHERE**: Added opt-in `GRAPHISTRY_NON_ADJ_WHERE_MULTI_EQ_SEMIJOIN` for multi-equality semijoin pruning (2-hop, experimental).
+- **GFQL / WHERE**: Added opt-in `GRAPHISTRY_NON_ADJ_WHERE_INEQ_AGG` for aggregated inequality pruning on 2-hop non-adj clauses (experimental).
+
+### Performance
+- **Compute / hop**: Refactored hop traversal to precompute node predicate domains and unify direction handling; synthetic CPU benchmarks show modest median improvements with some regressions on undirected/range scenarios.
+- **GFQL / WHERE**: Use DF-native forward pruning for cuDF equality constraints to avoid host syncs (pandas path unchanged).
+- **GFQL / WHERE**: Default non-adjacent WHERE mode now `auto`, enabling value-mode + domain semijoin auto, with edge semijoin auto for edge clauses (opt-out via env).
+- **GFQL / WHERE**: Auto mode skips value-mode on multi-clause non-adjacent WHERE when pair estimates exceed the semijoin threshold (guardrail against blowups).
+- **GFQL / WHERE**: Avoid building semijoin pair tables when AUTO semijoin stays inactive; uses cheap pair estimates to gate work.
+- **GFQL / WHERE**: Reduce semijoin dedup overhead and reuse cached edge pairs per edge when `allowed_edges` is unset.
+- **Compute / hop**: Undirected traversal skips oriented-pair expansion when no destination filters; modest CPU gains in undirected benchmarks.
+- **Compute / hop**: Fast-path traversal uses domain-based visited/frontier tracking to avoid per-hop concat+dedupe overhead; modest CPU improvements in synthetic benchmarks.
+
+### Fixed
+- **GFQL / chain**: Fixed `from_json` to validate `where` field type before casting, preventing type errors on malformed input.
+- **GFQL / WHERE**: Fixed undirected edge handling in WHERE clause filtering to check both src→dst and dst→src directions.
+- **GFQL / WHERE**: Fixed multi-hop path edge retention to keep all edges in valid paths, not just terminal edges.
+- **GFQL / WHERE**: Fixed unfiltered start node handling with multi-hop edges in native path executor.
+- **GFQL / WHERE**: Fixed vector-strategy guard to initialize start/end domains before pair-est gating (prevents UnboundLocalError).
+
+### Infra
+- **GFQL / same_path**: Modular architecture for WHERE execution: `same_path_types.py` (types), `same_path_plan.py` (planning), `df_executor.py` (execution), plus `same_path/` submodules for BFS, edge semantics, multihop, post-pruning, and WHERE filtering.
+- **Benchmarks**: Added manual hop microbench + frontier sweep scripts under `benchmarks/` (not wired into CI).
+- **GFQL / WHERE**: Added OTel detail counters for semijoin pair sizes and mid-intersection sizes to help diagnose dense multi-clause blowups.
+
 ### Tests
+- **GFQL / df_executor**: Added comprehensive test suite (core, amplify, patterns, dimension) with 200+ tests covering Yannakakis semijoin, WHERE clause filtering, multi-hop paths, and pandas/cuDF parity.
+- **GFQL / cuDF same-path**: Added strict/auto mode coverage for cuDF executor fallback behavior.
 - **Temporal**: Added datetime unit parity coverage (ms/us/ns) for ring layouts, GFQL time ring layouts, and temporal comparison predicates; relaxed honeypot hypergraph datetime unit expectations.
 
 ## [0.50.5 - 2026-01-25]

diff --git a/ai/README.md b/ai/README.md
@@ -184,19 +184,38 @@ WITH_BUILD=0 WITH_TEST=0 ./test-cpu-local.sh
 
 ### GPU Testing - Fast (Reuse Base Image)
 
-Docker containers include: **pytest, mypy, ruff** (preinstalled)
+Docker containers include: **pytest, mypy, ruff, cudf** (preinstalled)
 
 ```bash
-# Reuse existing graphistry image (no rebuild)
-IMAGE="graphistry/graphistry-nvidia:${APP_BUILD_TAG:-latest}-${CUDA_SHORT_VERSION:-12.8}"
-
+# Container with cuDF available (cudf 25.10)
+IMAGE="graphistry/graphistry-nvidia:v2.50.0-13.0"
+
+# Run compute + GFQL tests with cuDF fallback (491 tests)
+# Uses CUDA_VISIBLE_DEVICES="" to avoid GPU driver issues
+docker run --rm -v /home/lmeyerov/Work/pygraphistry:/app -w /app \
+  -e CUDA_VISIBLE_DEVICES="" \
+  $IMAGE \
+  python -m pytest graphistry/tests/test_compute*.py tests/gfql/ref/ -q \
+    --ignore=tests/gfql/ref/test_ref_enumerator.py \
+    -k "not cudf_gpu_path"
+
+# Run GFQL ref tests only (372 tests)
+docker run --rm -v /home/lmeyerov/Work/pygraphistry:/app -w /app \
+  -e CUDA_VISIBLE_DEVICES="" \
+  $IMAGE \
+  python -m pytest tests/gfql/ref/ -q \
+    --ignore=tests/gfql/ref/test_ref_enumerator.py
+
+# With full GPU access (requires nvidia-container-toolkit)
 docker run --rm --gpus all \
-    -v "$(pwd):/workspace:ro" \
-    -w /workspace -e PYTHONPATH=/workspace \
-    $IMAGE pytest graphistry/tests/test_file.py -v
+    -v /home/lmeyerov/Work/pygraphistry:/app -w /app \
+    $IMAGE python -m pytest graphistry/tests/compute/ -q
 ```
 
-**Fast iteration**: Use this during development
+**Note**: Tests in `graphistry/tests/compute/predicates/` require real GPU access.
+Use `CUDA_VISIBLE_DEVICES=""` for cuDF import-path testing without GPU.
+
+**Fast iteration**: Use cuDF container during development
 **Full rebuild**: Use `./docker/test-gpu-local.sh` before merge
 
 ### Environment Control

diff --git a/benchmarks/README.md b/benchmarks/README.md
@@ -0,0 +1,234 @@
+# Benchmarks
+
+Manual-only scripts for local performance checks. Not wired into CI.
+
+Summary results go into `benchmarks/RESULTS.md` (raw outputs stay in `plans/`).
+
+## Hop microbench
+
+Run a small set of hop() scenarios across synthetic graphs.
+
+```bash
+uv run python benchmarks/run_hop_microbench.py --runs 5 --output /tmp/hop-microbench.md
+```
+
+## Frontier sweep
+
+Sweep seed sizes on a fixed linear graph.
+
+```bash
+uv run python benchmarks/run_hop_frontier_sweep.py --runs 5 --nodes 100000 --edges 200000 --output /tmp/hop-frontier.md
+```
+
+Notes:
+- Use `--engine cudf` for GPU runs when cuDF is available.
+- Scripts print a table to stdout; `--output` writes Markdown results.
+
+## Chain vs Yannakakis
+
+Compare regular `chain()` against the Yannakakis same-path executor on synthetic graphs.
+
+```bash
+uv run python benchmarks/run_chain_vs_samepath.py --runs 7 --warmup 1 --output /tmp/chain-vs-samepath.md
+```
+
+By default, WHERE uses auto mode (value-mode + domain semijoin auto for non-adj clauses, edge semijoin auto for edge clauses).
+To compare against baseline behavior, set `--non-adj-mode baseline`.
+Use `--max-scenario-seconds 20` to fail fast on synthetic timeouts (best-effort).
+
+To focus on dense multi-clause scenarios:
+
+```bash
+uv run python benchmarks/run_chain_vs_samepath.py \
+  --graph-filter medium_dense,large_dense \
+  --scenario-filter nonadj_multi \
+  --runs 5 --warmup 1
+```
+
+Use `--seed` to make synthetic graph generation repeatable across runs.
+
+To toggle non-adjacent WHERE experiments on synthetic scenarios:
+
+```bash
+uv run python benchmarks/run_chain_vs_samepath.py \
+  --non-adj-mode value_prefilter \
+  --non-adj-value-card-max 500 \
+  --non-adj-order selectivity \
+  --non-adj-bounds \
+  --runs 7 --warmup 1
+```
+
+## Real-data GFQL
+
+Run GFQL chain scenarios on demo datasets plus WHERE scenarios (df_executor), with separate sections and a per-section score.
+
+```bash
+uv run python benchmarks/run_realdata_benchmarks.py --runs 7 --warmup 1 --output /tmp/realdata-gfql.md
+```
+
+To force baseline WHERE behavior for comparisons:
+
+```bash
+uv run python benchmarks/run_realdata_benchmarks.py \
+  --non-adj-mode baseline \
+  --runs 7 --warmup 1 --output /tmp/realdata-baseline.md
+```
+
+To test categorical domains for redteam:
+
+```bash
+uv run python benchmarks/run_realdata_benchmarks.py --datasets redteam50k --redteam-domain-categorical --runs 9 --warmup 2
+```
+
+To experiment with non-adjacent WHERE modes:
+
+```bash
+uv run python benchmarks/run_realdata_benchmarks.py \
+  --datasets redteam50k \
+  --non-adj-mode value_prefilter \
+  --non-adj-value-card-max 500 \
+  --non-adj-order selectivity \
+  --non-adj-bounds \
+  --runs 7 --warmup 1
+```
+
+Auto mode (value for low NDV, domain semijoin for the rest):
+
+```bash
+GRAPHISTRY_NON_ADJ_WHERE_DOMAIN_SEMIJOIN_AUTO=1 \
+uv run python benchmarks/run_realdata_benchmarks.py \
+  --datasets redteam50k,transactions \
+  --non-adj-mode auto \
+  --non-adj-value-ops "==,!=" \
+  --non-adj-value-card-max 10 \
+  --runs 3 --warmup 1 --opt-max-call-ms 0
+```
+
+To experiment with aggregated inequality pruning for 2-hop non-adj clauses:
+
+```bash
+GRAPHISTRY_NON_ADJ_WHERE_INEQ_AGG=1 \
+uv run python benchmarks/run_realdata_benchmarks.py --datasets redteam50k --runs 3 --warmup 1
+```
+
+Auto mode defaults to `==,!=` with a value-cardinality cap of 300 when no explicit value ops/card max are provided.
+
+To add NDV probe columns (high/low cardinality) and extra WHERE scenarios:
+
+```bash
+uv run python benchmarks/run_realdata_benchmarks.py \
+  --datasets redteam50k,transactions \
+  --ndv-probes --ndv-probe-buckets 3 --ndv-log \
+  --runs 3 --warmup 1
+```
+
+To enable OpenTelemetry spans for df_executor:
+
+```bash
+GRAPHISTRY_OTEL=1 \
+GRAPHISTRY_OTEL_DETAIL=1 \
+uv run --with opentelemetry-api --with opentelemetry-sdk \
+  python benchmarks/run_realdata_benchmarks.py --datasets redteam50k --runs 3 --warmup 1
+```
+
+To export spans to OTLP (optional):
+
+```bash
+GRAPHISTRY_OTEL=1 \
+GRAPHISTRY_OTEL_EXPORTER=otlp \
+OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318 \
+uv run --with opentelemetry-api --with opentelemetry-sdk --with opentelemetry-exporter-otlp \
+  python benchmarks/run_realdata_benchmarks.py --datasets redteam50k --runs 3 --warmup 1
+```
+
+To limit datasets:
+
+```bash
+uv run python benchmarks/run_realdata_benchmarks.py --datasets redteam50k,transactions --runs 7 --warmup 1
+```
+
+To focus on a subset of scenarios:
+
+```bash
+uv run python benchmarks/run_realdata_benchmarks.py \
+  --datasets transactions,redteam50k \
+  --skip-chain --where-filter ndv_ \
+  --ndv-probes --ndv-probe-buckets 3 --ndv-log \
+  --runs 3 --warmup 1 --max-scenario-seconds 5 --opt-max-call-ms 0
+```
+
+Available datasets: `redteam50k`, `transactions`, `facebook_combined`, `honeypot`, `twitter_demo`, `lesmiserables`, `twitter_congress`, `all`.
+
+## Optional Kuzu comparisons
+
+If the `kuzu` Python package is installed, you can run optional Kuzu comparisons (currently redteam-only):
+
+```bash
+uv run python benchmarks/run_realdata_benchmarks.py \
+  --datasets redteam50k \
+  --kuzu --kuzu-db-root /tmp/kuzu_bench \
+  --runs 3 --warmup 1
+```
+
+Use `--kuzu-rebuild` to recreate the Kuzu database from CSVs when needed.
+
+## Graph-benchmark q1-q9
+
+Replay the q1-q9 queries from https://github.com/prrao87/graph-benchmark against Graphistry.
+See `benchmarks/graph_benchmark.md` for setup details.
+
+```bash
+uv run python benchmarks/graph_benchmark_q1_q9.py \
+  --graph-benchmark-root /home/lmeyerov/Work/graph-benchmark \
+  --runs 5 --warmup 1 \
+  --output-json /tmp/graph-benchmark-q1-q9.json
+```
+
+Preindexed variant (relation/type split per query):
+
+```bash
+uv run python benchmarks/graph_benchmark_q1_q9.py \
+  --graph-benchmark-root /home/lmeyerov/Work/graph-benchmark \
+  --mode preindexed \
+  --runs 5 --warmup 1 \
+  --output-json /tmp/graph-benchmark-q1-q9-preindexed.json
+```
+
+Include preindex build time in per-query medians (adds `preindex_ms` and `median_ms_with_preindex`):
+
+```bash
+uv run python benchmarks/graph_benchmark_q1_q9.py \
+  --graph-benchmark-root /home/lmeyerov/Work/graph-benchmark \
+  --mode preindexed \
+  --include-preindex \
+  --runs 5 --warmup 1 \
+  --output-json /tmp/graph-benchmark-q1-q9-preindexed-with-preindex.json
+```
+
+Presorted variant (global sort by rel/src/dst and node_type/node_id):
+
+```bash
+uv run python benchmarks/graph_benchmark_q1_q9.py \
+  --graph-benchmark-root /home/lmeyerov/Work/graph-benchmark \
+  --mode presorted \
+  --runs 5 --warmup 1 \
+  --output-json /tmp/graph-benchmark-q1-q9-presorted.json
+```
+
+## WHERE opt matrix (comparative)
+
+Run a focused matrix of WHERE scenarios across opt profiles (value mode, domain semijoin, auto, edge semijoin, etc).
+Outputs are grouped by profile + scenario group, with defaults targeting dense multi-clause and real-data stress cases.
+
+```bash
+uv run python benchmarks/run_where_opt_matrix.py --runs 3 --warmup 1
+```
+
+To target only dense multi-clause synthetic cases:
+
+```bash
+uv run python benchmarks/run_where_opt_matrix.py \
+  --groups synthetic_multi_clause \
+  --profiles baseline,auto,vector \
+  --runs 5 --warmup 1
+```