Replies: 8 comments
-
|
@edespino , thank you very much for your efforts. This is truly outstanding work. However, I would like to offer a suggestion, considering one of your points about AO tables mentioned above. The tests above focus on query scenarios. I suggest adding test scenarios for data loading, updates, and deletion operations. Based on feedback from some customers, they face a dilemma when choosing between AO tables and AOCS tables: For scenarios requiring high-speed data ingestion (not bulk one-time loading, but continuous batch loading, such as loading once per second with 100,000 records each time), like in telecommunications or IoT, if they choose AOCS tables, the ingestion speed cannot keep up with the data generation rate, but it can meet subsequent query performance requirements; if they choose AO tables, the ingestion speed can be satisfied, but the subsequent query performance is relatively poor. According to previous community discussions, in addition to the PAX advantages you mentioned, another very important design goal of PAX tables is to achieve the ingestion speed of AO tables and the query speed of AOCS tables. |
Beta Was this translation helpful? Give feedback.
-
|
@edespino thank you for your feedback. we will carefully review each of the issues identified in the test results and provide responses as soon as possible. |
Beta Was this translation helpful? Give feedback.
-
|
The bloom filter and minmax info are stored both in the auxiliary table per micro-partition file, and in the micro-partition file per group. The total size of bloom filter storage is: The default estimated size of bloom filter storage for a column is 8KB, i.e. aspt. |
Beta Was this translation helpful? Give feedback.
-
|
Issue 1
Yes, the storage of bloom filter grows linearly with the number of columns that uses bloom filter.
No, bloom filter stores no differently for all types, there are the same.
No, as the memory size is determined, the size of bits is not changed. They are not dynamically changed.
No, it only depends how many bloom filter meta structure stores.
No.
No, the input for bloom filters is treated as byte stream.
Not yet now.
No hard limit now. We'll add this issue in our action items.
Not planned right now. |
Beta Was this translation helpful? Give feedback.
-
Issue 2
Updating bits of bloom filter is synchronous when INSERTING tuples.
No. The storage size of bloom filter grows linearly with the number of
NO, we can't know the cardinality of a column in advance.
Good suggestion. We can see the filtered file/groups from the log if |
Beta Was this translation helpful? Give feedback.
-
Issue 3I think there may be some mistakes that file-base filter is not disabled. Cloud you pasted the queries that repro the issue? |
Beta Was this translation helpful? Give feedback.
-
Issue 4
No, the hash function is the same for all column types now.
The GUC pax.bloom_filter_work_memory_bytes may control the size of
For string types, especially low cardinality, the storage overhead of bloom |
Beta Was this translation helpful? Give feedback.
-
Issue 5The bloom filter calculates not only for micro-partition file, but also for groups in a micro-partition file. Use of bloom filter should be smart, only for large cardinality and used for query filter conditions. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I've been working to understand PAX storage functionality by building a comprehensive benchmark suite. After testing 5 different workloads, I've identified several areas where I'd appreciate the development team's feedback.
-=e
PAX Storage Development Team - Technical Review Report
Date: October 30, 2025
Prepared By: Ed Espino
Repository: https://github.com/edespino/pax_benchmark
Purpose: Comprehensive findings across 5 benchmarks for PAX development team review
Executive Summary
I've completed 5 comprehensive benchmarks testing PAX storage across diverse workloads (50M+ total rows tested). Results show PAX is production-ready for specific use cases with validated configuration, but critical issues require development team attention.
Key Findings
✅ Successes:
❌ Critical Issues:
Benchmark Summary Matrix
Cross-Benchmark Insights
PAX No-Cluster: Consistently excellent (2-5% overhead) ✅
Z-order Clustering: Wildly inconsistent (0.002% to 58%) ❌
Issue #1: Clustering Overhead Inconsistency (CRITICAL)
The Problem
Z-order clustering overhead varies by 290x across benchmarks with no clear pattern:
Analysis
Hypothesis 1: Bloom Filter Count
Contradiction: Clickstream (3 blooms) has lower overhead than Retail Sales (1 bloom)
Hypothesis 2: Bloom Filter Data Types
Observation: TEXT and UUID bloom filters appear problematic
Hypothesis 3: Data Distribution
Observation: Data locality may affect clustering efficiency
Hypothesis 4: Cardinality Interaction
Observation: Cardinality alone doesn't explain variance
Questions for Development Team
Is there a bloom filter metadata structure that grows non-linearly with:
Does Z-order clustering store bloom filter data differently than no-cluster variant?
Are TEXT/UUID bloom filters implemented differently from VARCHAR?
What is the recommended bloom filter limit?
Issue #2: Bloom Filter Misconfiguration Disaster (PRODUCTION INCIDENT)
What Happened
During retail_sales benchmark development, I configured bloom filters on low-cardinality columns:
Cardinality Analysis:
Result:
Root Cause
Low-cardinality bloom filters cause massive storage bloat:
The Fix
Result After Fix:
Why This Is Critical
This happened during controlled testing with validation framework. In production without validation:
Time Cost: Hours/days of debugging + data reload
Storage Cost: 80%+ wasted space
Performance Cost: 2-3x slower queries
Detection: Silent failure (no error messages)
Questions for Development Team
Can PAX detect low-cardinality bloom filters at table creation?
Can bloom filter size be estimated before creation?
Should there be a cardinality validation gate in PAX code?
Can EXPLAIN ANALYZE show bloom filter effectiveness?
Issue #3: File-Level Predicate Pushdown Disabled (KNOWN CODE ISSUE)
Current Status
File-level predicate pushdown is disabled in PAX scanner code:
Location:
pax_scanner.cc:366Impact on Query Performance
Without predicate pushdown, PAX reads files before filtering:
Clickstream Q1 (date range filter):
Expected Behavior (with pushdown):
Actual Behavior (without pushdown):
Performance Impact:
Why This Matters
PAX's key advantage is file-level pruning via:
Without predicate pushdown, these features provide minimal benefit:
Observed Performance
Observation: PAX still achieves 43x speedup on time-based queries (Q7/Q8) despite disabled pushdown, suggesting Z-order clustering alone is highly effective.
Questions for Development Team
pax.enable_file_level_pushdown)?Issue #4: TEXT Bloom Filter Overhead (45-58% Bloat)
The Problem
Bloom filters on TEXT and UUID columns cause extreme clustering overhead:
Log Analytics Benchmark (2 bloom filters):
Financial Trading Benchmark (12 bloom filters):
Comparison to VARCHAR Bloom Filters
Clickstream Benchmark (3 VARCHAR bloom filters):
Analysis
Bloom filter overhead by data type:
Hypothesis: TEXT/UUID bloom filters may:
Questions for Development Team
Are TEXT bloom filters implemented differently from VARCHAR?
Why is VARCHAR(8) overhead so high (197 MB for 1 column)?
Can UUID bloom filters be optimized?
Is there a bloom filter size configuration?
Should documentation recommend avoiding TEXT/UUID blooms?
Issue #5: Bloom Filter Count Threshold
Observed Pattern
The "3 Bloom Filter Rule"
Hypothesis: 3 validated VARCHAR bloom filters is the sweet spot.
Evidence:
But this contradicts:
Questions for Development Team
Is there a recommended bloom filter limit?
Why does 1 bloom perform worse than 3 in some cases?
Can PAX warn when too many bloom filters are configured?
Configuration Best Practices (Based on 5 Benchmarks)
Bloom Filter Selection (CRITICAL)
Rule 1: Cardinality Validation
Rule 2: Limit Bloom Filter Count
Rule 3: Prefer VARCHAR over TEXT/UUID
Rule 4: Use Validation Framework
MinMax Columns (Always Safe)
Rule: Include ALL filterable columns
Z-order Clustering
Rule 1: Choose 2-3 Correlated Dimensions
Rule 2: Validate Memory Before Clustering
GUC Settings
When to Use PAX (Production Recommendations)
✅ PAX No-Cluster (Always Recommended)
Use for:
Configuration:
Expected Results:
Use ONLY when:
Avoid if:
Expected Results:
❌ Never Use
AO (row-oriented) for analytical workloads:
Testing Methodology
Benchmark Suite Overview
5 Benchmarks Completed:
Retail Sales - Original comprehensive benchmark
IoT Time-Series - Validation-first framework
Financial Trading - High-cardinality testing
Log Analytics - Sparse column testing
E-commerce Clickstream - Multi-dimensional analysis
Test Environment
Validation Framework
All benchmarks (except Retail Sales) use validation-first approach:
Phase 1: Cardinality Analysis
Phase 2: Configuration Generation
Phase 3: Safety Gates
Result: Zero misconfiguration incidents across 4 validation-first benchmarks
Detailed Benchmark Results
1. IoT Time-Series
Dataset: 10M sensor readings, 100K devices, 7 days
PAX Config: 0 bloom filters, 8 minmax columns, Z-order (date + device_id)
Key Finding: Zero bloom filters = near-zero clustering overhead
2. E-commerce Clickstream
Dataset: 10M events, 3.9M sessions, 464K users, 100K products
PAX Config: 3 VARCHAR bloom filters (validated), 8 minmax columns, Z-order (date + session)
Key Finding: 3 validated VARCHAR bloom filters = minimal overhead (0.2%)
3. Retail Sales
Dataset: 10M transactions, 10M customers, 100K products
PAX Config: 1 VARCHAR bloom filter (after fix), 6 minmax columns, Z-order (date + region)
Before Fix (3 bloom filters, 2 low-cardinality):
After Fix (1 validated bloom filter):
Key Finding: Bloom filter misconfiguration causes catastrophic failure (81% bloat)
4. Log Analytics
Dataset: 10M log entries, 10M request IDs, 10M trace IDs
PAX Config: 2 bloom filters (request_id UUID, trace_id UUID), 8 minmax columns, Z-order (timestamp + trace_id)
Key Finding: TEXT/UUID bloom filters cause 45.8% clustering overhead despite excellent cardinality
5. Financial Trading
Dataset: 10M trading ticks, 5K symbols, 1K traders
PAX Config: 12 bloom filters (mixed types), 9 minmax columns, Z-order (timestamp + symbol)
Key Finding: 12 bloom filters cause 58.1% clustering overhead (excessive)
Code-Level Observations
1. PAX Scanner (pax_scanner.cc:366)
Issue: File-level predicate pushdown disabled
Evidence: EXPLAIN ANALYZE shows:
Expected (with pushdown):
Impact: PAX performance limited to 1-2x faster instead of 5-10x faster
2. Bloom Filter Implementation
Observations:
Questions:
pax.bloom_filter_size_bytes)3. Z-order Clustering Metadata
Observations:
Questions:
Recommended Next Steps
For Development Team
Priority 1: Investigate Clustering Overhead
Priority 2: Enable File-Level Predicate Pushdown
Priority 3: Bloom Filter Validation
Priority 4: TEXT/UUID Bloom Filter Optimization
Priority 5: Documentation Updates
For Benchmark Testing
Completed:
Pending:
Appendix: Configuration Examples
Example 1: Time-Series (IoT, Metrics, Telemetry)
Expected: +3-5% storage overhead, 0-1% clustering overhead, 2-5x query speedup
Example 2: E-commerce Clickstream
Expected: +2-3% storage overhead, 0.2-1% clustering overhead, 40-75x speedup on time-based queries
Example 3: Log Analytics (Avoid TEXT Blooms)
Expected (no blooms): +3-5% storage overhead, 0-1% clustering overhead
Expected (with TEXT/UUID blooms): +3-5% storage overhead, 45-58% clustering overhead ❌
Contact & Feedback
This report synthesizes findings from 5 comprehensive benchmarks (50M+ rows tested). I've identified several critical issues requiring development team attention:
Questions for Development Team: See 20+ specific questions throughout this report.
Benchmark Code: All 5 benchmarks available in
/benchmarksdirectory with:Repository: https://github.com/edespino/pax_benchmark
Next Benchmarks: 2 remaining per PAX_BENCHMARK_SUITE_PLAN.md (Telecom CDR, Geospatial)
Report Generated: October 30, 2025
Benchmarks Completed: 5 of 7 planned
Total Rows Tested: 50M+
Total Test Time: ~10 hours (including debugging and fixes)
Lines of Code: ~15,000 (SQL + shell + documentation)
Beta Was this translation helpful? Give feedback.
All reactions