Add H3 and facet optimization CLI commands#19
Open
rdhyee wants to merge 5 commits intoisamplesorg:mainfrom
Open
Add H3 and facet optimization CLI commands#19rdhyee wants to merge 5 commits intoisamplesorg:mainfrom
rdhyee wants to merge 5 commits intoisamplesorg:mainfrom
Conversation
Hand-crafted small examples to help understand the iSamples PQG format: - JSON: 1-sample and 3-sample examples (validated against schema) - CSV: Flattened entity files (samples, events, locations, sites, agents, edges) - Parquet: Same data in all 3 formats: - Export (3 rows, nested structs) - Narrow (21 rows, explicit edge rows) - Wide (10 rows, p__* columns) Includes README with: - Entity relationship diagram - Example queries for each format - Format comparison table Idea from meeting with Stephen Richard - small examples make format differences much easier to understand. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…org#18) Creates self-contained task specifications for: - H3 geospatial optimization (experiments/h3_optimization/) - Facet metadata optimization (experiments/facet_optimization/) These are formatted as Claude Code Web-ready prompts with: - Pinned data URLs - Exact column names - Step-by-step tasks - Expected output formats Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Results from running all 7 facet optimization tasks: - Baseline benchmarks: source (34ms), material (490ms), otype (35ms) - Generated summary parquet files for source, material, context facets - Combined facet summary with 60 rows - Cross-facet summary (source × material) with 24 combinations - Speedup achieved: 8.7x for source, 140.1x for material facets https://claude.ai/code/session_016aGrEntdNnvpPjUqkpAtdC
- 4.85x speedup for bbox queries (170ms → 35ms) - 4.87x speedup for faceted geo queries - Only 3.7% file size increase (282MB → 292MB) - 5.98M samples with coords out of 6.68M total Co-Authored-By: Claude <noreply@anthropic.com>
- add-h3: Add H3 index columns at specified resolutions - Supports local files and remote URLs - Configurable lat/lon columns and resolutions - Uses H3 community extension - facet-summaries: Generate pre-computed facet summary tables - Combined summaries for source, material, context, object_type - Source × material cross-tabulation - Configurable otype filter and minimum cross-count Implements CLI support for optimizations benchmarked in isamplesorg#17 and isamplesorg#18. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds two new CLI commands implementing the optimizations benchmarked in #17 and #18:
pqg add-h3- H3 Geospatial Indexingpqg add-h3 wide.parquet -o wide_h3.parquet pqg add-h3 wide.parquet -o wide_h3.parquet -r 4,6 # custom resolutionspqg facet-summaries- Pre-computed Facet TablesGenerates:
facet_summaries_all.parquet- Combined source/material/context/object_type countsfacet_source_material_cross.parquet- Source × material cross-tabulation140x speedup for material facet queries
~3KB total output size
Benchmark Results
Test Plan
pqg add-h3 --helpshows usagepqg facet-summaries --helpshows usageRelated Issues
Closes #17 (H3 geospatial optimization)
Closes #18 (Facet metadata optimization)
🤖 Generated with Claude Code