You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/guides/parquet-mode.md
+14-3Lines changed: 14 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -19,17 +19,18 @@ Traditional TSDB format and Store Gateway architecture face significant challeng
19
19
20
20
### TSDB Format Limitations
21
21
-**Random Read Intensive**: TSDB index relies heavily on random reads, where each read becomes a separate request to object storage
22
-
-**Overfetching**: To reduce object storage requests, data needs to be merged, leading to higher bandwidth usage and overfetching
22
+
-**Overfetching**: To reduce object storage requests, data that are close together are merged in a sigle request, leading to higher bandwidth usage and overfetching
23
23
-**High Cardinality Bottlenecks**: Index postings can become a major bottleneck for high cardinality data
24
24
25
25
### Store Gateway Operational Challenges
26
-
-**Resource Intensive**: Requires significant local disk space for index headers and high memory utilization
27
-
-**Complex State Management**: Needs complex data sharding when scaling, often causing consistency and availability issues
26
+
-**Resource Intensive**: Requires significant local disk space for index headers and high memory usage
27
+
-**Complex State Management**: Requires complex data sharding when scaling, which often leads to consistency and availability issues, as well as long startup times
28
28
-**Query Inefficiencies**: Single-threaded block processing leads to high latency for large blocks
29
29
30
30
### Parquet Advantages
31
31
[Apache Parquet](https://parquet.apache.org/) addresses these challenges through:
32
32
-**Columnar Storage**: Data organized by columns reduces object storage requests as only specific columns need to be fetched
33
+
-**Data Locality**: Series that are likely to be queried together are co-located to minimize I/O operations
33
34
-**Stateless Design**: Rich file metadata eliminates the need for local state like index headers
34
35
-**Advanced Compression**: Reduces storage costs and improves query performance
35
36
-**Parallel Processing**: Row groups enable parallel processing for better scalability
@@ -132,6 +133,9 @@ querier:
132
133
133
134
# Default block store: "tsdb" or "parquet"
134
135
parquet_queryable_default_block_store: "parquet"
136
+
137
+
# Disable fallback to TSDB blocks when parquet files are not available
138
+
parquet_queryable_fallback_disabled: false
135
139
```
136
140
137
141
### Query Limits for Parquet
@@ -227,6 +231,7 @@ When parquet queryable is enabled:
227
231
* The bucket index now contains metadata indicating whether parquet files are available for querying
228
232
1. **Query Execution**: Queries prioritize parquet files when available, falling back to TSDB blocks when parquet conversion is incomplete
229
233
1. **Hybrid Queries**: Supports querying both parquet and TSDB blocks within the same query operation
234
+
1. **Fallback Control**: When `parquet_queryable_fallback_disabled` is set to `true`, queries will fail with a consistency check error if any required blocks are not available as parquet files, ensuring strict parquet-only querying
2. **Cache Size**: Tune `parquet_queryable_shard_cache_size` based on available memory
277
282
3. **Concurrency**: Adjust `meta_sync_concurrency` based on object storage performance
278
283
284
+
### Fallback Configuration
285
+
286
+
1. **Gradual Migration**: Keep `parquet_queryable_fallback_disabled: false` (default) during initial deployment to allow queries to succeed even when parquet conversion is incomplete
287
+
2. **Strict Parquet Mode**: Set `parquet_queryable_fallback_disabled: true` only after ensuring all required blocks have been converted to parquet format
288
+
3. **Monitoring**: Monitor conversion progress and query failures before enabling strict parquet mode
289
+
279
290
## Limitations
280
291
281
292
1. **Experimental Feature**: Parquet mode is experimental and may have stability issues
Copy file name to clipboardExpand all lines: pkg/parquetconverter/converter.go
+5-5Lines changed: 5 additions & 5 deletions
Original file line number
Diff line number
Diff line change
@@ -104,11 +104,11 @@ type Converter struct {
104
104
func (cfg*Config) RegisterFlags(f*flag.FlagSet) {
105
105
cfg.Ring.RegisterFlags(f)
106
106
107
-
f.StringVar(&cfg.DataDir, "parquet-converter.data-dir", "./data", "Data directory in which to cache blocks and process conversions.")
108
-
f.IntVar(&cfg.MetaSyncConcurrency, "parquet-converter.meta-sync-concurrency", 20, "Number of Go routines to use when syncing block meta files from the long term storage.")
109
-
f.IntVar(&cfg.MaxRowsPerRowGroup, "parquet-converter.max-rows-per-row-group", 1e6, "Max number of rows per parquet row group.")
110
-
f.DurationVar(&cfg.ConversionInterval, "parquet-converter.conversion-interval", time.Minute, "The frequency at which the conversion job runs.")
111
-
f.BoolVar(&cfg.FileBufferEnabled, "parquet-converter.file-buffer-enabled", true, "Whether to enable buffering the writes in disk to reduce memory utilization.")
107
+
f.StringVar(&cfg.DataDir, "parquet-converter.data-dir", "./data", "Local directory path for caching TSDB blocks during parquet conversion.")
108
+
f.IntVar(&cfg.MetaSyncConcurrency, "parquet-converter.meta-sync-concurrency", 20, "Maximum concurrent goroutines for downloading block metadata from object storage.")
109
+
f.IntVar(&cfg.MaxRowsPerRowGroup, "parquet-converter.max-rows-per-row-group", 1e6, "Maximum number of time series per parquet row group. Larger values improve compression but may reduce performance during reads.")
110
+
f.DurationVar(&cfg.ConversionInterval, "parquet-converter.conversion-interval", time.Minute, "How often to check for new TSDB blocks to convert to parquet format.")
111
+
f.BoolVar(&cfg.FileBufferEnabled, "parquet-converter.file-buffer-enabled", true, "Enable disk-based write buffering to reduce memory consumption during parquet file generation.")
f.BoolVar(&cfg.IgnoreMaxQueryLength, "querier.ignore-max-query-length", false, "If enabled, ignore max query length check at Querier select method. Users can choose to ignore it since the validation can be done before Querier evaluation like at Query Frontend or Ruler.")
145
145
f.BoolVar(&cfg.EnablePromQLExperimentalFunctions, "querier.enable-promql-experimental-functions", false, "[Experimental] If true, experimental promQL functions are enabled.")
146
146
f.BoolVar(&cfg.EnableParquetQueryable, "querier.enable-parquet-queryable", false, "[Experimental] If true, querier will try to query the parquet files if available.")
147
-
f.IntVar(&cfg.ParquetQueryableShardCacheSize, "querier.parquet-queryable-shard-cache-size", 512, "[Experimental] [Experimental] Maximum size of the Parquet queryable shard cache. 0 to disable.")
148
-
f.StringVar(&cfg.ParquetQueryableDefaultBlockStore, "querier.parquet-queryable-default-block-store", string(parquetBlockStore), "Parquet queryable's default block store to query. Valid options are tsdb and parquet. If it is set to tsdb, parquet queryable always fallback to store gateway.")
147
+
f.IntVar(&cfg.ParquetQueryableShardCacheSize, "querier.parquet-queryable-shard-cache-size", 512, "[Experimental] Maximum size of the Parquet queryable shard cache. 0 to disable.")
148
+
f.StringVar(&cfg.ParquetQueryableDefaultBlockStore, "querier.parquet-queryable-default-block-store", string(parquetBlockStore), "[Experimental] Parquet queryable's default block store to query. Valid options are tsdb and parquet. If it is set to tsdb, parquet queryable always fallback to store gateway.")
149
149
f.BoolVar(&cfg.ParquetQueryableFallbackDisabled, "querier.parquet-queryable-fallback-disabled", false, "[Experimental] Disable Parquet queryable to fallback queries to Store Gateway if the block is not available as Parquet files but available in TSDB. Setting this to true will disable the fallback and users can remove Store Gateway. But need to make sure Parquet files are created before it is queryable.")
0 commit comments