|
| 1 | +--- |
| 2 | +title: "Parquet Mode" |
| 3 | +linkTitle: "Parquet Mode" |
| 4 | +weight: 11 |
| 5 | +slug: parquet-mode |
| 6 | +--- |
| 7 | + |
| 8 | +## Overview |
| 9 | + |
| 10 | +Parquet mode in Cortex provides an experimental feature that converts TSDB blocks to Parquet format for improved query performance and storage efficiency on older data. This feature is particularly beneficial for long-term storage scenarios where data is accessed less frequently but needs to be queried efficiently. |
| 11 | + |
| 12 | +The parquet mode consists of two main components: |
| 13 | +- **Parquet Converter**: Converts TSDB blocks to Parquet format |
| 14 | +- **Parquet Queryable**: Enables querying of Parquet files with fallback to TSDB blocks |
| 15 | + |
| 16 | +## Why Parquet Mode? |
| 17 | + |
| 18 | +Traditional TSDB format and Store Gateway architecture face significant challenges when dealing with long-term data storage on object storage: |
| 19 | + |
| 20 | +### TSDB Format Limitations |
| 21 | +- **Random Read Intensive**: TSDB index relies heavily on random reads, where each read becomes a separate request to object storage |
| 22 | +- **Overfetching**: To reduce object storage requests, data needs to be merged, leading to higher bandwidth usage and overfetching |
| 23 | +- **High Cardinality Bottlenecks**: Index postings can become a major bottleneck for high cardinality data |
| 24 | + |
| 25 | +### Store Gateway Operational Challenges |
| 26 | +- **Resource Intensive**: Requires significant local disk space for index headers and high memory utilization |
| 27 | +- **Complex State Management**: Needs complex data sharding when scaling, often causing consistency and availability issues |
| 28 | +- **Query Inefficiencies**: Single-threaded block processing leads to high latency for large blocks |
| 29 | + |
| 30 | +### Parquet Advantages |
| 31 | +[Apache Parquet](https://parquet.apache.org/) addresses these challenges through: |
| 32 | +- **Columnar Storage**: Data organized by columns reduces object storage requests as only specific columns need to be fetched |
| 33 | +- **Stateless Design**: Rich file metadata eliminates the need for local state like index headers |
| 34 | +- **Advanced Compression**: Reduces storage costs and improves query performance |
| 35 | +- **Parallel Processing**: Row groups enable parallel processing for better scalability |
| 36 | + |
| 37 | +For more details on the design rationale, see the [Parquet Storage Proposal](../proposals/parquet-storage.md). |
| 38 | + |
| 39 | +## Architecture |
| 40 | + |
| 41 | +The parquet system works by: |
| 42 | + |
| 43 | +1. **Block Conversion**: The parquet converter runs periodically to identify TSDB blocks that should be converted to Parquet format |
| 44 | +2. **Storage**: Parquet files are stored alongside TSDB blocks in object storage |
| 45 | +3. **Querying**: The parquet queryable attempts to query Parquet files first, falling back to TSDB blocks when necessary |
| 46 | +4. **Marker System**: Conversion status is tracked using marker files to avoid duplicate conversions |
| 47 | + |
| 48 | +## Configuration |
| 49 | + |
| 50 | +### Enabling Parquet Converter |
| 51 | + |
| 52 | +To enable the parquet converter service, add it to your target list: |
| 53 | + |
| 54 | +```yaml |
| 55 | +target: parquet-converter |
| 56 | +``` |
| 57 | +
|
| 58 | +Or include it in a multi-target deployment: |
| 59 | +
|
| 60 | +```yaml |
| 61 | +target: all,parquet-converter |
| 62 | +``` |
| 63 | +
|
| 64 | +### Parquet Converter Configuration |
| 65 | +
|
| 66 | +Configure the parquet converter in your Cortex configuration: |
| 67 | +
|
| 68 | +```yaml |
| 69 | +parquet_converter: |
| 70 | + # Data directory for caching blocks during conversion |
| 71 | + data_dir: "./data" |
| 72 | + |
| 73 | + # Frequency of conversion job execution |
| 74 | + conversion_interval: 1m |
| 75 | + |
| 76 | + # Maximum rows per parquet row group |
| 77 | + max_rows_per_row_group: 1000000 |
| 78 | + |
| 79 | + # Number of concurrent meta file sync operations |
| 80 | + meta_sync_concurrency: 20 |
| 81 | + |
| 82 | + # Enable file buffering to reduce memory usage |
| 83 | + file_buffer_enabled: true |
| 84 | + |
| 85 | + # Ring configuration for distributed conversion |
| 86 | + ring: |
| 87 | + kvstore: |
| 88 | + store: consul |
| 89 | + consul: |
| 90 | + host: localhost:8500 |
| 91 | + heartbeat_period: 5s |
| 92 | + heartbeat_timeout: 1m |
| 93 | + instance_addr: 127.0.0.1 |
| 94 | + instance_port: 9095 |
| 95 | +``` |
| 96 | +
|
| 97 | +### Per-Tenant Parquet Settings |
| 98 | +
|
| 99 | +Enable parquet conversion per tenant using limits: |
| 100 | +
|
| 101 | +```yaml |
| 102 | +limits: |
| 103 | + # Enable parquet converter for all tenants |
| 104 | + parquet_converter_enabled: true |
| 105 | + |
| 106 | + # Shard size for shuffle sharding (0 = disabled) |
| 107 | + parquet_converter_tenant_shard_size: 0.8 |
| 108 | +``` |
| 109 | +
|
| 110 | +You can also configure per-tenant settings using runtime configuration: |
| 111 | +
|
| 112 | +```yaml |
| 113 | +overrides: |
| 114 | + tenant-1: |
| 115 | + parquet_converter_enabled: true |
| 116 | + parquet_converter_tenant_shard_size: 2 |
| 117 | + tenant-2: |
| 118 | + parquet_converter_enabled: false |
| 119 | +``` |
| 120 | +
|
| 121 | +### Enabling Parquet Queryable |
| 122 | +
|
| 123 | +To enable querying of Parquet files, configure the querier: |
| 124 | +
|
| 125 | +```yaml |
| 126 | +querier: |
| 127 | + # Enable parquet queryable with fallback (experimental) |
| 128 | + enable_parquet_queryable: true |
| 129 | + |
| 130 | + # Cache size for parquet shards |
| 131 | + parquet_queryable_shard_cache_size: 512 |
| 132 | + |
| 133 | + # Default block store: "tsdb" or "parquet" |
| 134 | + parquet_queryable_default_block_store: "parquet" |
| 135 | +``` |
| 136 | +
|
| 137 | +### Query Limits for Parquet |
| 138 | +
|
| 139 | +Configure query limits specific to parquet operations: |
| 140 | +
|
| 141 | +```yaml |
| 142 | +limits: |
| 143 | + # Maximum number of rows that can be scanned per query |
| 144 | + parquet_max_fetched_row_count: 1000000 |
| 145 | + |
| 146 | + # Maximum chunk bytes per query |
| 147 | + parquet_max_fetched_chunk_bytes: 100MB |
| 148 | + |
| 149 | + # Maximum data bytes per query |
| 150 | + parquet_max_fetched_data_bytes: 1GB |
| 151 | +``` |
| 152 | +
|
| 153 | +### Cache Configuration |
| 154 | +
|
| 155 | +Parquet mode supports dedicated caching for both chunks and labels to improve query performance. Configure caching in the blocks storage section: |
| 156 | +
|
| 157 | +```yaml |
| 158 | +blocks_storage: |
| 159 | + bucket_store: |
| 160 | + # Chunks cache configuration for parquet data |
| 161 | + chunks_cache: |
| 162 | + backend: "memcached" # Options: "", "inmemory", "memcached", "redis" |
| 163 | + subrange_size: 16000 # Size of each subrange for better caching |
| 164 | + max_get_range_requests: 3 # Max sub-GetRange requests per GetRange call |
| 165 | + attributes_ttl: 168h # TTL for caching object attributes |
| 166 | + subrange_ttl: 24h # TTL for caching individual chunk subranges |
| 167 | + |
| 168 | + # Memcached configuration (if using memcached backend) |
| 169 | + memcached: |
| 170 | + addresses: "memcached:11211" |
| 171 | + timeout: 500ms |
| 172 | + max_idle_connections: 16 |
| 173 | + max_async_concurrency: 10 |
| 174 | + max_async_buffer_size: 10000 |
| 175 | + max_get_multi_concurrency: 100 |
| 176 | + max_get_multi_batch_size: 0 |
| 177 | + |
| 178 | + # Parquet labels cache configuration (experimental) |
| 179 | + parquet_labels_cache: |
| 180 | + backend: "memcached" # Options: "", "inmemory", "memcached", "redis" |
| 181 | + subrange_size: 16000 # Size of each subrange for better caching |
| 182 | + max_get_range_requests: 3 # Max sub-GetRange requests per GetRange call |
| 183 | + attributes_ttl: 168h # TTL for caching object attributes |
| 184 | + subrange_ttl: 24h # TTL for caching individual label subranges |
| 185 | + |
| 186 | + # Memcached configuration (if using memcached backend) |
| 187 | + memcached: |
| 188 | + addresses: "memcached:11211" |
| 189 | + timeout: 500ms |
| 190 | + max_idle_connections: 16 |
| 191 | +``` |
| 192 | +
|
| 193 | +#### Cache Backend Options |
| 194 | +
|
| 195 | +- **Empty string ("")**: Disables caching |
| 196 | +- **inmemory**: Uses in-memory cache (suitable for single-instance deployments) |
| 197 | +- **memcached**: Uses Memcached for distributed caching (recommended for production) |
| 198 | +- **redis**: Uses Redis for distributed caching |
| 199 | +- **Multi-level**: Comma-separated list for multi-tier caching (e.g., "inmemory,memcached") |
| 200 | +
|
| 201 | +#### Cache Performance Tuning |
| 202 | +
|
| 203 | +- **subrange_size**: Smaller values increase cache hit rates but create more cache entries |
| 204 | +- **max_get_range_requests**: Higher values reduce object storage requests but increase memory usage |
| 205 | +- **TTL values**: Balance between cache freshness and hit rates based on your data patterns |
| 206 | +- **Multi-level caching**: Use "inmemory,memcached" for L1/L2 cache hierarchy |
| 207 | +
|
| 208 | +## Block Conversion Logic |
| 209 | +
|
| 210 | +The parquet converter determines which blocks to convert based on: |
| 211 | +
|
| 212 | +1. **Time Range**: Only blocks with time ranges larger than the base TSDB block duration (typically 2h) are converted |
| 213 | +2. **Conversion Status**: Blocks are only converted once, tracked via marker files |
| 214 | +3. **Tenant Settings**: Conversion must be enabled for the specific tenant |
| 215 | +
|
| 216 | +The conversion process: |
| 217 | +- Downloads TSDB blocks from object storage |
| 218 | +- Converts time series data to Parquet format |
| 219 | +- Uploads Parquet files (chunks and labels) to object storage |
| 220 | +- Creates conversion marker files to track completion |
| 221 | +
|
| 222 | +## Querying Behavior |
| 223 | +
|
| 224 | +When parquet queryable is enabled: |
| 225 | +
|
| 226 | +1. **Block Discovery**: The bucket index is used to discover available blocks |
| 227 | + * The bucket index now contains metadata indicating whether parquet files are available for querying |
| 228 | +1. **Query Execution**: Queries prioritize parquet files when available, falling back to TSDB blocks when parquet conversion is incomplete |
| 229 | +1. **Hybrid Queries**: Supports querying both parquet and TSDB blocks within the same query operation |
| 230 | +
|
| 231 | +## Monitoring |
| 232 | +
|
| 233 | +### Parquet Converter Metrics |
| 234 | +
|
| 235 | +Monitor parquet converter operations: |
| 236 | +
|
| 237 | +```promql |
| 238 | +# Blocks converted |
| 239 | +cortex_parquet_converter_blocks_converted_total |
| 240 | + |
| 241 | +# Conversion failures |
| 242 | +cortex_parquet_converter_block_convert_failures_total |
| 243 | + |
| 244 | +# Delay in minutes of Parquet block to be converted from the TSDB block being uploaded to object store |
| 245 | +cortex_parquet_converter_convert_block_delay_minutes |
| 246 | +``` |
| 247 | + |
| 248 | +### Parquet Queryable Metrics |
| 249 | + |
| 250 | +Monitor parquet query performance: |
| 251 | + |
| 252 | +```promql |
| 253 | +# Blocks queried by type |
| 254 | +cortex_parquet_queryable_blocks_queried_total |
| 255 | +
|
| 256 | +# Query operations |
| 257 | +cortex_parquet_queryable_operations_total |
| 258 | +
|
| 259 | +# Cache metrics |
| 260 | +cortex_parquet_queryable_cache_hits_total |
| 261 | +cortex_parquet_queryable_cache_misses_total |
| 262 | +``` |
| 263 | + |
| 264 | +## Best Practices |
| 265 | + |
| 266 | +### Deployment Recommendations |
| 267 | + |
| 268 | +1. **Dedicated Converters**: Run parquet converters on dedicated instances for better resource isolation |
| 269 | +2. **Ring Configuration**: Use a distributed ring for high availability and load distribution |
| 270 | +3. **Storage Considerations**: Ensure sufficient disk space in `data_dir` for block processing |
| 271 | +4. **Network Bandwidth**: Consider network bandwidth for downloading/uploading blocks |
| 272 | + |
| 273 | +### Performance Tuning |
| 274 | + |
| 275 | +1. **Row Group Size**: Adjust `max_rows_per_row_group` based on your query patterns |
| 276 | +2. **Cache Size**: Tune `parquet_queryable_shard_cache_size` based on available memory |
| 277 | +3. **Concurrency**: Adjust `meta_sync_concurrency` based on object storage performance |
| 278 | + |
| 279 | +## Limitations |
| 280 | + |
| 281 | +1. **Experimental Feature**: Parquet mode is experimental and may have stability issues |
| 282 | +2. **Storage Overhead**: Parquet files are stored in addition to TSDB blocks |
| 283 | +3. **Conversion Latency**: There's a delay between block creation and parquet availability |
| 284 | +4. **Shuffle Sharding Requirement**: Parquet mode only supports shuffle sharding as sharding strategy |
| 285 | +5. **Bucket Index Dependency**: The bucket index must be enabled and properly configured as it provides essential metadata for parquet file discovery and query routing |
| 286 | + |
| 287 | +## Migration Considerations |
| 288 | + |
| 289 | +When enabling parquet mode: |
| 290 | + |
| 291 | +1. **Gradual Rollout**: Enable for specific tenants first |
| 292 | +2. **Monitor Resources**: Watch CPU, memory, and storage usage |
| 293 | +3. **Backup Strategy**: Ensure TSDB blocks remain available as fallback |
| 294 | +4. **Testing**: Thoroughly test query patterns before production deployment |
0 commit comments