Skip to content

Add H3 and facet optimization CLI commands#19

Open
rdhyee wants to merge 5 commits intoisamplesorg:mainfrom
rdhyee:experiments/optimization-tasks
Open

Add H3 and facet optimization CLI commands#19
rdhyee wants to merge 5 commits intoisamplesorg:mainfrom
rdhyee:experiments/optimization-tasks

Conversation

@rdhyee
Copy link
Contributor

@rdhyee rdhyee commented Jan 30, 2026

Summary

Adds two new CLI commands implementing the optimizations benchmarked in #17 and #18:

pqg add-h3 - H3 Geospatial Indexing

pqg add-h3 wide.parquet -o wide_h3.parquet
pqg add-h3 wide.parquet -o wide_h3.parquet -r 4,6  # custom resolutions
  • Adds h3_res4, h3_res6, h3_res8 columns (configurable)
  • ~5x speedup for geospatial queries
  • Only 3.7% file size increase
  • Uses DuckDB H3 community extension

pqg facet-summaries - Pre-computed Facet Tables

pqg facet-summaries wide.parquet -o summaries/

Generates:

  • facet_summaries_all.parquet - Combined source/material/context/object_type counts

  • facet_source_material_cross.parquet - Source × material cross-tabulation

  • 140x speedup for material facet queries

  • ~3KB total output size

Benchmark Results

Optimization Query Baseline Optimized Speedup
H3 Bounding box 170ms 35ms 4.85x
Facets Material counts 490ms 3.5ms 140x

Test Plan

  • pqg add-h3 --help shows usage
  • pqg facet-summaries --help shows usage
  • Both commands work with local files
  • Both commands work with remote URLs
  • Integration tests (TODO)

Related Issues

Closes #17 (H3 geospatial optimization)
Closes #18 (Facet metadata optimization)


🤖 Generated with Claude Code

rdhyee and others added 5 commits January 14, 2026 12:55
Hand-crafted small examples to help understand the iSamples PQG format:

- JSON: 1-sample and 3-sample examples (validated against schema)
- CSV: Flattened entity files (samples, events, locations, sites, agents, edges)
- Parquet: Same data in all 3 formats:
  - Export (3 rows, nested structs)
  - Narrow (21 rows, explicit edge rows)
  - Wide (10 rows, p__* columns)

Includes README with:
- Entity relationship diagram
- Example queries for each format
- Format comparison table

Idea from meeting with Stephen Richard - small examples make format
differences much easier to understand.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…org#18)

Creates self-contained task specifications for:
- H3 geospatial optimization (experiments/h3_optimization/)
- Facet metadata optimization (experiments/facet_optimization/)

These are formatted as Claude Code Web-ready prompts with:
- Pinned data URLs
- Exact column names
- Step-by-step tasks
- Expected output formats

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Results from running all 7 facet optimization tasks:
- Baseline benchmarks: source (34ms), material (490ms), otype (35ms)
- Generated summary parquet files for source, material, context facets
- Combined facet summary with 60 rows
- Cross-facet summary (source × material) with 24 combinations
- Speedup achieved: 8.7x for source, 140.1x for material facets

https://claude.ai/code/session_016aGrEntdNnvpPjUqkpAtdC
- 4.85x speedup for bbox queries (170ms → 35ms)
- 4.87x speedup for faceted geo queries
- Only 3.7% file size increase (282MB → 292MB)
- 5.98M samples with coords out of 6.68M total

Co-Authored-By: Claude <noreply@anthropic.com>
- add-h3: Add H3 index columns at specified resolutions
  - Supports local files and remote URLs
  - Configurable lat/lon columns and resolutions
  - Uses H3 community extension

- facet-summaries: Generate pre-computed facet summary tables
  - Combined summaries for source, material, context, object_type
  - Source × material cross-tabulation
  - Configurable otype filter and minimum cross-count

Implements CLI support for optimizations benchmarked in isamplesorg#17 and isamplesorg#18.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Explore faceted metadata query optimizations via pre-computation Explore geospatial query optimizations via H3 pre-computation

2 participants