Skip to content

Commit e277b85

Browse files
authored
Creating Parquet guide doc. (#6919)
* first draft parquet guide Signed-off-by: alanprot <[email protected]> * Removing some not needed sections Signed-off-by: alanprot <[email protected]> * adding why Signed-off-by: alanprot <[email protected]> * adding why Signed-off-by: alanprot <[email protected]> * addressing comments Signed-off-by: alanprot <[email protected]> * Adding cache section Signed-off-by: alanprot <[email protected]> * run make clean-white-noise Signed-off-by: alanprot <[email protected]> * removing one tip that does not make much sense Signed-off-by: alanprot <[email protected]> --------- Signed-off-by: alanprot <[email protected]>
1 parent 138c070 commit e277b85

File tree

1 file changed

+294
-0
lines changed

1 file changed

+294
-0
lines changed

docs/guides/parquet-mode.md

Lines changed: 294 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,294 @@
1+
---
2+
title: "Parquet Mode"
3+
linkTitle: "Parquet Mode"
4+
weight: 11
5+
slug: parquet-mode
6+
---
7+
8+
## Overview
9+
10+
Parquet mode in Cortex provides an experimental feature that converts TSDB blocks to Parquet format for improved query performance and storage efficiency on older data. This feature is particularly beneficial for long-term storage scenarios where data is accessed less frequently but needs to be queried efficiently.
11+
12+
The parquet mode consists of two main components:
13+
- **Parquet Converter**: Converts TSDB blocks to Parquet format
14+
- **Parquet Queryable**: Enables querying of Parquet files with fallback to TSDB blocks
15+
16+
## Why Parquet Mode?
17+
18+
Traditional TSDB format and Store Gateway architecture face significant challenges when dealing with long-term data storage on object storage:
19+
20+
### TSDB Format Limitations
21+
- **Random Read Intensive**: TSDB index relies heavily on random reads, where each read becomes a separate request to object storage
22+
- **Overfetching**: To reduce object storage requests, data needs to be merged, leading to higher bandwidth usage and overfetching
23+
- **High Cardinality Bottlenecks**: Index postings can become a major bottleneck for high cardinality data
24+
25+
### Store Gateway Operational Challenges
26+
- **Resource Intensive**: Requires significant local disk space for index headers and high memory utilization
27+
- **Complex State Management**: Needs complex data sharding when scaling, often causing consistency and availability issues
28+
- **Query Inefficiencies**: Single-threaded block processing leads to high latency for large blocks
29+
30+
### Parquet Advantages
31+
[Apache Parquet](https://parquet.apache.org/) addresses these challenges through:
32+
- **Columnar Storage**: Data organized by columns reduces object storage requests as only specific columns need to be fetched
33+
- **Stateless Design**: Rich file metadata eliminates the need for local state like index headers
34+
- **Advanced Compression**: Reduces storage costs and improves query performance
35+
- **Parallel Processing**: Row groups enable parallel processing for better scalability
36+
37+
For more details on the design rationale, see the [Parquet Storage Proposal](../proposals/parquet-storage.md).
38+
39+
## Architecture
40+
41+
The parquet system works by:
42+
43+
1. **Block Conversion**: The parquet converter runs periodically to identify TSDB blocks that should be converted to Parquet format
44+
2. **Storage**: Parquet files are stored alongside TSDB blocks in object storage
45+
3. **Querying**: The parquet queryable attempts to query Parquet files first, falling back to TSDB blocks when necessary
46+
4. **Marker System**: Conversion status is tracked using marker files to avoid duplicate conversions
47+
48+
## Configuration
49+
50+
### Enabling Parquet Converter
51+
52+
To enable the parquet converter service, add it to your target list:
53+
54+
```yaml
55+
target: parquet-converter
56+
```
57+
58+
Or include it in a multi-target deployment:
59+
60+
```yaml
61+
target: all,parquet-converter
62+
```
63+
64+
### Parquet Converter Configuration
65+
66+
Configure the parquet converter in your Cortex configuration:
67+
68+
```yaml
69+
parquet_converter:
70+
# Data directory for caching blocks during conversion
71+
data_dir: "./data"
72+
73+
# Frequency of conversion job execution
74+
conversion_interval: 1m
75+
76+
# Maximum rows per parquet row group
77+
max_rows_per_row_group: 1000000
78+
79+
# Number of concurrent meta file sync operations
80+
meta_sync_concurrency: 20
81+
82+
# Enable file buffering to reduce memory usage
83+
file_buffer_enabled: true
84+
85+
# Ring configuration for distributed conversion
86+
ring:
87+
kvstore:
88+
store: consul
89+
consul:
90+
host: localhost:8500
91+
heartbeat_period: 5s
92+
heartbeat_timeout: 1m
93+
instance_addr: 127.0.0.1
94+
instance_port: 9095
95+
```
96+
97+
### Per-Tenant Parquet Settings
98+
99+
Enable parquet conversion per tenant using limits:
100+
101+
```yaml
102+
limits:
103+
# Enable parquet converter for all tenants
104+
parquet_converter_enabled: true
105+
106+
# Shard size for shuffle sharding (0 = disabled)
107+
parquet_converter_tenant_shard_size: 0.8
108+
```
109+
110+
You can also configure per-tenant settings using runtime configuration:
111+
112+
```yaml
113+
overrides:
114+
tenant-1:
115+
parquet_converter_enabled: true
116+
parquet_converter_tenant_shard_size: 2
117+
tenant-2:
118+
parquet_converter_enabled: false
119+
```
120+
121+
### Enabling Parquet Queryable
122+
123+
To enable querying of Parquet files, configure the querier:
124+
125+
```yaml
126+
querier:
127+
# Enable parquet queryable with fallback (experimental)
128+
enable_parquet_queryable: true
129+
130+
# Cache size for parquet shards
131+
parquet_queryable_shard_cache_size: 512
132+
133+
# Default block store: "tsdb" or "parquet"
134+
parquet_queryable_default_block_store: "parquet"
135+
```
136+
137+
### Query Limits for Parquet
138+
139+
Configure query limits specific to parquet operations:
140+
141+
```yaml
142+
limits:
143+
# Maximum number of rows that can be scanned per query
144+
parquet_max_fetched_row_count: 1000000
145+
146+
# Maximum chunk bytes per query
147+
parquet_max_fetched_chunk_bytes: 100MB
148+
149+
# Maximum data bytes per query
150+
parquet_max_fetched_data_bytes: 1GB
151+
```
152+
153+
### Cache Configuration
154+
155+
Parquet mode supports dedicated caching for both chunks and labels to improve query performance. Configure caching in the blocks storage section:
156+
157+
```yaml
158+
blocks_storage:
159+
bucket_store:
160+
# Chunks cache configuration for parquet data
161+
chunks_cache:
162+
backend: "memcached" # Options: "", "inmemory", "memcached", "redis"
163+
subrange_size: 16000 # Size of each subrange for better caching
164+
max_get_range_requests: 3 # Max sub-GetRange requests per GetRange call
165+
attributes_ttl: 168h # TTL for caching object attributes
166+
subrange_ttl: 24h # TTL for caching individual chunk subranges
167+
168+
# Memcached configuration (if using memcached backend)
169+
memcached:
170+
addresses: "memcached:11211"
171+
timeout: 500ms
172+
max_idle_connections: 16
173+
max_async_concurrency: 10
174+
max_async_buffer_size: 10000
175+
max_get_multi_concurrency: 100
176+
max_get_multi_batch_size: 0
177+
178+
# Parquet labels cache configuration (experimental)
179+
parquet_labels_cache:
180+
backend: "memcached" # Options: "", "inmemory", "memcached", "redis"
181+
subrange_size: 16000 # Size of each subrange for better caching
182+
max_get_range_requests: 3 # Max sub-GetRange requests per GetRange call
183+
attributes_ttl: 168h # TTL for caching object attributes
184+
subrange_ttl: 24h # TTL for caching individual label subranges
185+
186+
# Memcached configuration (if using memcached backend)
187+
memcached:
188+
addresses: "memcached:11211"
189+
timeout: 500ms
190+
max_idle_connections: 16
191+
```
192+
193+
#### Cache Backend Options
194+
195+
- **Empty string ("")**: Disables caching
196+
- **inmemory**: Uses in-memory cache (suitable for single-instance deployments)
197+
- **memcached**: Uses Memcached for distributed caching (recommended for production)
198+
- **redis**: Uses Redis for distributed caching
199+
- **Multi-level**: Comma-separated list for multi-tier caching (e.g., "inmemory,memcached")
200+
201+
#### Cache Performance Tuning
202+
203+
- **subrange_size**: Smaller values increase cache hit rates but create more cache entries
204+
- **max_get_range_requests**: Higher values reduce object storage requests but increase memory usage
205+
- **TTL values**: Balance between cache freshness and hit rates based on your data patterns
206+
- **Multi-level caching**: Use "inmemory,memcached" for L1/L2 cache hierarchy
207+
208+
## Block Conversion Logic
209+
210+
The parquet converter determines which blocks to convert based on:
211+
212+
1. **Time Range**: Only blocks with time ranges larger than the base TSDB block duration (typically 2h) are converted
213+
2. **Conversion Status**: Blocks are only converted once, tracked via marker files
214+
3. **Tenant Settings**: Conversion must be enabled for the specific tenant
215+
216+
The conversion process:
217+
- Downloads TSDB blocks from object storage
218+
- Converts time series data to Parquet format
219+
- Uploads Parquet files (chunks and labels) to object storage
220+
- Creates conversion marker files to track completion
221+
222+
## Querying Behavior
223+
224+
When parquet queryable is enabled:
225+
226+
1. **Block Discovery**: The bucket index is used to discover available blocks
227+
* The bucket index now contains metadata indicating whether parquet files are available for querying
228+
1. **Query Execution**: Queries prioritize parquet files when available, falling back to TSDB blocks when parquet conversion is incomplete
229+
1. **Hybrid Queries**: Supports querying both parquet and TSDB blocks within the same query operation
230+
231+
## Monitoring
232+
233+
### Parquet Converter Metrics
234+
235+
Monitor parquet converter operations:
236+
237+
```promql
238+
# Blocks converted
239+
cortex_parquet_converter_blocks_converted_total
240+
241+
# Conversion failures
242+
cortex_parquet_converter_block_convert_failures_total
243+
244+
# Delay in minutes of Parquet block to be converted from the TSDB block being uploaded to object store
245+
cortex_parquet_converter_convert_block_delay_minutes
246+
```
247+
248+
### Parquet Queryable Metrics
249+
250+
Monitor parquet query performance:
251+
252+
```promql
253+
# Blocks queried by type
254+
cortex_parquet_queryable_blocks_queried_total
255+
256+
# Query operations
257+
cortex_parquet_queryable_operations_total
258+
259+
# Cache metrics
260+
cortex_parquet_queryable_cache_hits_total
261+
cortex_parquet_queryable_cache_misses_total
262+
```
263+
264+
## Best Practices
265+
266+
### Deployment Recommendations
267+
268+
1. **Dedicated Converters**: Run parquet converters on dedicated instances for better resource isolation
269+
2. **Ring Configuration**: Use a distributed ring for high availability and load distribution
270+
3. **Storage Considerations**: Ensure sufficient disk space in `data_dir` for block processing
271+
4. **Network Bandwidth**: Consider network bandwidth for downloading/uploading blocks
272+
273+
### Performance Tuning
274+
275+
1. **Row Group Size**: Adjust `max_rows_per_row_group` based on your query patterns
276+
2. **Cache Size**: Tune `parquet_queryable_shard_cache_size` based on available memory
277+
3. **Concurrency**: Adjust `meta_sync_concurrency` based on object storage performance
278+
279+
## Limitations
280+
281+
1. **Experimental Feature**: Parquet mode is experimental and may have stability issues
282+
2. **Storage Overhead**: Parquet files are stored in addition to TSDB blocks
283+
3. **Conversion Latency**: There's a delay between block creation and parquet availability
284+
4. **Shuffle Sharding Requirement**: Parquet mode only supports shuffle sharding as sharding strategy
285+
5. **Bucket Index Dependency**: The bucket index must be enabled and properly configured as it provides essential metadata for parquet file discovery and query routing
286+
287+
## Migration Considerations
288+
289+
When enabling parquet mode:
290+
291+
1. **Gradual Rollout**: Enable for specific tenants first
292+
2. **Monitor Resources**: Watch CPU, memory, and storage usage
293+
3. **Backup Strategy**: Ensure TSDB blocks remain available as fallback
294+
4. **Testing**: Thoroughly test query patterns before production deployment

0 commit comments

Comments
 (0)