Skip to content

Commit aff27a7

Browse files
Merge branch 'db_main' into pranavmishradatabricks/block-expensive-queries
Signed-off-by: pranavmishradatabricks <[email protected]>
2 parents c4bdbc8 + 94eb766 commit aff27a7

File tree

96 files changed

+4236
-1366
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

96 files changed

+4236
-1366
lines changed

CHANGELOG.md

Lines changed: 54 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -12,28 +12,76 @@ We use *breaking :warning:* to mark changes that are not backward compatible (re
1212

1313
### Fixed
1414

15+
### Added
16+
17+
### Changed
18+
19+
### Removed
20+
21+
## [v0.35.1](https://github.com/thanos-io/thanos/tree/release-0.35) - 28.05.2024
22+
23+
### Fixed
24+
25+
- [#7323](https://github.com/thanos-io/thanos/pull/7323) Sidecar: wait for prometheus on startup
26+
- [#6948](https://github.com/thanos-io/thanos/pull/6948) Receive: fix goroutines leak during series requests to thanos store api.
27+
- [#7382](https://github.com/thanos-io/thanos/pull/7382) *: Ensure objstore flag values are masked & disable debug/pprof/cmdline
28+
- [#7392](https://github.com/thanos-io/thanos/pull/7392) Query: fix broken min, max for pre 0.34.1 sidecars
29+
- [#7373](https://github.com/thanos-io/thanos/pull/7373) Receive: Fix stats for remote write
30+
- [#7318](https://github.com/thanos-io/thanos/pull/7318) Compactor: Recover from panic to log block ID
31+
32+
### Added
33+
34+
### Changed
35+
36+
### Removed
37+
38+
## [v0.35.0](https://github.com/thanos-io/thanos/tree/release-0.35) - 02.05.2024
39+
40+
### Fixed
41+
1542
- [#7083](https://github.com/thanos-io/thanos/pull/7083) Store Gateway: Fix lazy expanded postings with 0 length failed to be cached.
1643
- [#7080](https://github.com/thanos-io/thanos/pull/7080) Receive: race condition in handler Close() when stopped early
1744
- [#7132](https://github.com/thanos-io/thanos/pull/7132) Documentation: fix broken helm installation instruction
1845
- [#7134](https://github.com/thanos-io/thanos/pull/7134) Store, Compact: Revert the recursive block listing mechanism introduced in https://github.com/thanos-io/thanos/pull/6474 and use the same strategy as in 0.31. Introduce a `--block-discovery-strategy` flag to control the listing strategy so that a recursive lister can still be used if the tradeoff of slower but cheaper discovery is preferred.
1946
- [#7122](https://github.com/thanos-io/thanos/pull/7122) Store Gateway: Fix lazy expanded postings estimate base cardinality using posting group with remove keys.
47+
- [#7166](https://github.com/thanos-io/thanos/pull/7166) Receive/MultiTSDB: Do not delete non-uploaded blocks
48+
- [#7179](https://github.com/thanos-io/thanos/pull/7179) Query: Fix merging of query analysis
2049
- [#7224](https://github.com/thanos-io/thanos/pull/7224) Query-frontend: Add Redis username to the client configuration.
2150
- [#7220](https://github.com/thanos-io/thanos/pull/7220) Store Gateway: Fix lazy expanded postings caching partial expanded postings and bug of estimating remove postings with non existent value. Added `PromQLSmith` based fuzz test to improve correctness.
51+
- [#7225](https://github.com/thanos-io/thanos/pull/7225) Compact: Don't halt due to overlapping sources when vertical compaction is enabled
52+
- [#7244](https://github.com/thanos-io/thanos/pull/7244) Query: Fix Internal Server Error unknown targetHealth: "unknown" when trying to open the targets page.
53+
- [#7248](https://github.com/thanos-io/thanos/pull/7248) Receive: Fix RemoteWriteAsync was sequentially executed causing high latency in the ingestion path.
54+
- [#7271](https://github.com/thanos-io/thanos/pull/7271) Query: fixing dedup iterator when working on mixed sample types.
55+
- [#7289](https://github.com/thanos-io/thanos/pull/7289) Query Frontend: show warnings from downstream queries.
56+
- [#7308](https://github.com/thanos-io/thanos/pull/7308) Store: Batch TSDB Infos for blocks.
2257

2358
### Added
2459

60+
- [#7155](https://github.com/thanos-io/thanos/pull/7155) Receive: Add tenant globbing support to hashring config
61+
- [#7231](https://github.com/thanos-io/thanos/pull/7231) Tracing: added missing sampler types
2562
- [#7194](https://github.com/thanos-io/thanos/pull/7194) Downsample: retry objstore related errors
2663
- [#7105](https://github.com/thanos-io/thanos/pull/7105) Rule: add flag `--query.enable-x-functions` to allow usage of extended promql functions (xrate, xincrease, xdelta) in loaded rules
2764
- [#6867](https://github.com/thanos-io/thanos/pull/6867) Query UI: Tenant input box added to the Query UI, in order to be able to specify which tenant the query should use.
28-
- [#7175](https://github.com/thanos-io/thanos/pull/7175): Query: Add `--query.mode=distributed` which enables the new distributed mode of the Thanos query engine.
29-
- [#7199](https://github.com/thanos-io/thanos/pull/7199): Reloader: Add support for watching and decompressing Prometheus configuration directories
30-
- [#7200](https://github.com/thanos-io/thanos/pull/7175): Query: Add `--selector.relabel-config` and `--selector.relabel-config-file` flags which allows scoping the Querier to a subset of matched TSDBs.
31-
- [#7233](https://github.com/thanos-io/thanos/pull/7233): UI: Showing Block Size Stats
65+
- [#7186](https://github.com/thanos-io/thanos/pull/7186) Query UI: Only show tenant input box when query tenant enforcement is enabled
66+
- [#7175](https://github.com/thanos-io/thanos/pull/7175) Query: Add `--query.mode=distributed` which enables the new distributed mode of the Thanos query engine.
67+
- [#7199](https://github.com/thanos-io/thanos/pull/7199) Reloader: Add support for watching and decompressing Prometheus configuration directories
68+
- [#7200](https://github.com/thanos-io/thanos/pull/7175) Query: Add `--selector.relabel-config` and `--selector.relabel-config-file` flags which allows scoping the Querier to a subset of matched TSDBs.
69+
- [#7233](https://github.com/thanos-io/thanos/pull/7233) UI: Showing Block Size Stats
70+
- [#7256](https://github.com/thanos-io/thanos/pull/7256) Receive: Split remote-write HTTP requests via tenant labels of series
71+
- [#7269](https://github.com/thanos-io/thanos/pull/7269) Query UI: Show peak/total samples in query analysis
72+
- [#7280](https://github.com/thanos-io/thanos/pull/7281) *: Adding User-Agent to request logs
73+
- [#7219](https://github.com/thanos-io/thanos/pull/7219) Receive: add `--remote-write.client-tls-secure` and `--remote-write.client-tls-skip-verify` flags to stop relying on grpc server config to determine grpc client secure/skipVerify.
74+
- [#7297](https://github.com/thanos-io/thanos/pull/7297) *: mark as not queryable if status is not ready
75+
- [#7302](https://github.com/thanos-io/thanos/pull/7303) Considering the `X-Forwarded-For` header for the remote address in the logs.
76+
- [#7304](https://github.com/thanos-io/thanos/pull/7304) Store: Use loser trees for merging results
3277

3378
### Changed
3479

3580
- [#7123](https://github.com/thanos-io/thanos/pull/7123) Rule: Change default Alertmanager API version to v2.
81+
- [#7192](https://github.com/thanos-io/thanos/pull/7192) Rule: Do not turn off ruler even if resolving fails
3682
- [#7223](https://github.com/thanos-io/thanos/pull/7223) Automatic detection of memory limits and configure GOMEMLIMIT to match.
83+
- [#7283](https://github.com/thanos-io/thanos/pull/7283) Compact: *breaking :warning:* Replace group with resolution in compact downsample metrics to avoid cardinality explosion with large numbers of groups.
84+
- [#7305](https://github.com/thanos-io/thanos/pull/7305) Query|Receiver: Do not log full request on ProxyStore by default.
3785

3886
### Removed
3987

@@ -44,6 +92,7 @@ We use *breaking :warning:* to mark changes that are not backward compatible (re
4492
- [#7078](https://github.com/thanos-io/thanos/pull/7078) *: Bump gRPC to 1.57.2
4593

4694
### Added
95+
- [#7105](https://github.com/thanos-io/thanos/pull/7105) Rule: add flag `--query.enable-x-functions` to allow usage of extended promql functions (xrate, xincrease, xdelta) in loaded rules
4796

4897
### Changed
4998

@@ -147,6 +196,7 @@ We use *breaking :warning:* to mark changes that are not backward compatible (re
147196
- [#6692](https://github.com/thanos-io/thanos/pull/6692) Store: Fix matching bug when using empty alternative in regex matcher, for example (a||b).
148197
- [#6679](https://github.com/thanos-io/thanos/pull/6697) Store: Fix block deduplication
149198
- [#6706](https://github.com/thanos-io/thanos/pull/6706) Store: Series responses should always be sorted
199+
- [#7286](https://github.com/thanos-io/thanos/pull/7286) Query: Propagate instant query warnings in distributed execution mode.
150200

151201
### Added
152202

VERSION

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
0.35.0-dev
1+
0.35.1

cmd/thanos/compact.go

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -458,9 +458,9 @@ func runCompact(
458458
}
459459

460460
for _, meta := range filteredMetas {
461-
groupKey := meta.Thanos.GroupKey()
462-
downsampleMetrics.downsamples.WithLabelValues(groupKey)
463-
downsampleMetrics.downsampleFailures.WithLabelValues(groupKey)
461+
resolutionLabel := meta.Thanos.ResolutionString()
462+
downsampleMetrics.downsamples.WithLabelValues(resolutionLabel)
463+
downsampleMetrics.downsampleFailures.WithLabelValues(resolutionLabel)
464464
}
465465

466466
if err := downsampleBucket(

cmd/thanos/downsample.go

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -50,16 +50,16 @@ func newDownsampleMetrics(reg *prometheus.Registry) *DownsampleMetrics {
5050
m.downsamples = promauto.With(reg).NewCounterVec(prometheus.CounterOpts{
5151
Name: "thanos_compact_downsample_total",
5252
Help: "Total number of downsampling attempts.",
53-
}, []string{"group"})
53+
}, []string{"resolution"})
5454
m.downsampleFailures = promauto.With(reg).NewCounterVec(prometheus.CounterOpts{
5555
Name: "thanos_compact_downsample_failures_total",
5656
Help: "Total number of failed downsampling attempts.",
57-
}, []string{"group"})
57+
}, []string{"resolution"})
5858
m.downsampleDuration = promauto.With(reg).NewHistogramVec(prometheus.HistogramOpts{
5959
Name: "thanos_compact_downsample_duration_seconds",
6060
Help: "Duration of downsample runs",
6161
Buckets: []float64{60, 300, 900, 1800, 3600, 7200, 14400}, // 1m, 5m, 15m, 30m, 60m, 120m, 240m
62-
}, []string{"group"})
62+
}, []string{"resolution"})
6363

6464
return m
6565
}
@@ -130,9 +130,9 @@ func RunDownsample(
130130
}
131131

132132
for _, meta := range metas {
133-
groupKey := meta.Thanos.GroupKey()
134-
metrics.downsamples.WithLabelValues(groupKey)
135-
metrics.downsampleFailures.WithLabelValues(groupKey)
133+
resolutionLabel := meta.Thanos.ResolutionString()
134+
metrics.downsamples.WithLabelValues(resolutionLabel)
135+
metrics.downsampleFailures.WithLabelValues(resolutionLabel)
136136
}
137137
if err := downsampleBucket(ctx, logger, metrics, insBkt, metas, dataDir, downsampleConcurrency, blockFilesConcurrency, hashFunc, false); err != nil {
138138
return errors.Wrap(err, "downsampling failed")
@@ -263,11 +263,11 @@ func downsampleBucket(
263263
errMsg = "downsampling to 60 min"
264264
}
265265
if err := processDownsampling(workerCtx, logger, bkt, m, dir, resolution, hashFunc, metrics, acceptMalformedIndex, blockFilesConcurrency); err != nil {
266-
metrics.downsampleFailures.WithLabelValues(m.Thanos.GroupKey()).Inc()
266+
metrics.downsampleFailures.WithLabelValues(m.Thanos.ResolutionString()).Inc()
267267
errCh <- errors.Wrap(err, errMsg)
268268

269269
}
270-
metrics.downsamples.WithLabelValues(m.Thanos.GroupKey()).Inc()
270+
metrics.downsamples.WithLabelValues(m.Thanos.ResolutionString()).Inc()
271271
}
272272
}()
273273
}
@@ -391,7 +391,7 @@ func processDownsampling(
391391
downsampleDuration := time.Since(begin)
392392
level.Info(logger).Log("msg", "downsampled block",
393393
"from", m.ULID, "to", id, "duration", downsampleDuration, "duration_ms", downsampleDuration.Milliseconds())
394-
metrics.downsampleDuration.WithLabelValues(m.Thanos.GroupKey()).Observe(downsampleDuration.Seconds())
394+
metrics.downsampleDuration.WithLabelValues(m.Thanos.ResolutionString()).Observe(downsampleDuration.Seconds())
395395

396396
stats, err := block.GatherIndexHealthStats(ctx, logger, filepath.Join(resdir, block.IndexFilename), m.MinTime, m.MaxTime)
397397
if err == nil {

cmd/thanos/main.go

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -214,6 +214,11 @@ func getFlagsMap(flags []*kingpin.FlagModel) map[string]string {
214214
if boilerplateFlags.GetFlag(f.Name) != nil {
215215
continue
216216
}
217+
// Mask inline objstore flag which can have credentials.
218+
if f.Name == "objstore.config" || f.Name == "objstore.config-file" {
219+
flagsMap[f.Name] = "<REDACTED>"
220+
continue
221+
}
217222
flagsMap[f.Name] = f.Value.String()
218223
}
219224

cmd/thanos/main_test.go

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -157,7 +157,7 @@ func TestRegression4960_Deadlock(t *testing.T) {
157157
testutil.Ok(t, err)
158158

159159
metrics := newDownsampleMetrics(prometheus.NewRegistry())
160-
testutil.Equals(t, 0.0, promtest.ToFloat64(metrics.downsamples.WithLabelValues(meta.Thanos.GroupKey())))
160+
testutil.Equals(t, 0.0, promtest.ToFloat64(metrics.downsamples.WithLabelValues(meta.Thanos.ResolutionString())))
161161
baseBlockIDsFetcher := block.NewConcurrentLister(logger, bkt)
162162
metaFetcher, err := block.NewMetaFetcher(nil, block.FetcherConcurrency, bkt, baseBlockIDsFetcher, "", nil, nil)
163163
testutil.Ok(t, err)
@@ -197,15 +197,15 @@ func TestCleanupDownsampleCacheFolder(t *testing.T) {
197197
testutil.Ok(t, err)
198198

199199
metrics := newDownsampleMetrics(prometheus.NewRegistry())
200-
testutil.Equals(t, 0.0, promtest.ToFloat64(metrics.downsamples.WithLabelValues(meta.Thanos.GroupKey())))
200+
testutil.Equals(t, 0.0, promtest.ToFloat64(metrics.downsamples.WithLabelValues(meta.Thanos.ResolutionString())))
201201
baseBlockIDsFetcher := block.NewConcurrentLister(logger, bkt)
202202
metaFetcher, err := block.NewMetaFetcher(nil, block.FetcherConcurrency, bkt, baseBlockIDsFetcher, "", nil, nil)
203203
testutil.Ok(t, err)
204204

205205
metas, _, err := metaFetcher.Fetch(ctx)
206206
testutil.Ok(t, err)
207207
testutil.Ok(t, downsampleBucket(ctx, logger, metrics, bkt, metas, dir, 1, 1, metadata.NoneFunc, false))
208-
testutil.Equals(t, 1.0, promtest.ToFloat64(metrics.downsamples.WithLabelValues(meta.Thanos.GroupKey())))
208+
testutil.Equals(t, 1.0, promtest.ToFloat64(metrics.downsamples.WithLabelValues(meta.Thanos.ResolutionString())))
209209

210210
_, err = os.Stat(dir)
211211
testutil.Assert(t, os.IsNotExist(err), "index cache dir should not exist at the end of execution")

cmd/thanos/query.go

Lines changed: 21 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -127,6 +127,9 @@ func registerQuery(app *extkingpin.App) {
127127
queryReplicaLabels := cmd.Flag("query.replica-label", "Labels to treat as a replica indicator along which data is deduplicated. Still you will be able to query without deduplication using 'dedup=false' parameter. Data includes time series, recording rules, and alerting rules.").
128128
Strings()
129129

130+
enableDedupMerge := cmd.Flag("query.dedup-merge", "Enable deduplication merge of multiple time series with the same labels.").
131+
Default("false").Bool()
132+
130133
instantDefaultMaxSourceResolution := extkingpin.ModelDuration(cmd.Flag("query.instant.default.max_source_resolution", "default value for max_source_resolution for instant queries. If not set, defaults to 0s only taking raw resolution into account. 1h can be a good value if you use instant queries over time ranges that incorporate times outside of your raw-retention.").Default("0s").Hidden())
131134

132135
defaultMetadataTimeRange := cmd.Flag("query.metadata.default-time-range", "The default metadata time range duration for retrieving labels through Labels and Series API when the range parameters are not specified. The zero value means range covers the time since the beginning.").Default("0s").Duration()
@@ -218,7 +221,7 @@ func registerQuery(app *extkingpin.App) {
218221
storeSelectorRelabelConf := *extflag.RegisterPathOrContent(
219222
cmd,
220223
"selector.relabel-config",
221-
"YAML with relabeling configuration that allows the Querier to select specific TSDBs by their external label. It follows native Prometheus relabel-config syntax. See format details: https://prometheus.io/docs/prometheus/latest/configuration/configuration/#relabel_config ",
224+
"YAML file with relabeling configuration that allows selecting blocks to query based on their external labels. It follows the Thanos sharding relabel-config syntax. For format details see: https://thanos.io/tip/thanos/sharding.md/#relabelling ",
222225
extflag.WithEnvSubstitution(),
223226
)
224227

@@ -374,6 +377,7 @@ func registerQuery(app *extkingpin.App) {
374377
*enforceTenancy,
375378
*tenantLabel,
376379
*enableGroupReplicaPartialStrategy,
380+
*enableDedupMerge,
377381
)
378382
})
379383
}
@@ -457,6 +461,7 @@ func runQuery(
457461
enforceTenancy bool,
458462
tenantLabel string,
459463
groupReplicaPartialResponseStrategy bool,
464+
enableDedupMerge bool,
460465
) error {
461466
if alertQueryURL == "" {
462467
lastColon := strings.LastIndex(httpBindAddr, ":")
@@ -566,24 +571,19 @@ func runQuery(
566571
exemplarsProxy = exemplars.NewProxy(logger, endpoints.GetExemplarsStores, selectorLset)
567572
queryableCreator query.QueryableCreator
568573
)
569-
if groupReplicaPartialResponseStrategy {
570-
level.Info(logger).Log("msg", "Enabled group-replica partial response strategy")
571-
queryableCreator = query.NewQueryableCreatorWithGroupReplicaPartialResponseStrategy(
572-
logger,
573-
extprom.WrapRegistererWithPrefix("thanos_query_", reg),
574-
proxy,
575-
maxConcurrentSelects,
576-
queryTimeout,
577-
)
578-
} else {
579-
queryableCreator = query.NewQueryableCreator(
580-
logger,
581-
extprom.WrapRegistererWithPrefix("thanos_query_", reg),
582-
proxy,
583-
maxConcurrentSelects,
584-
queryTimeout,
585-
)
574+
opts := query.Options{
575+
GroupReplicaPartialResponseStrategy: groupReplicaPartialResponseStrategy,
576+
EnableDedupMerge: enableDedupMerge,
586577
}
578+
level.Info(logger).Log("msg", "databricks querier features", "opts", opts)
579+
queryableCreator = query.NewQueryableCreatorWithOptions(
580+
logger,
581+
extprom.WrapRegistererWithPrefix("thanos_query_", reg),
582+
proxy,
583+
maxConcurrentSelects,
584+
queryTimeout,
585+
opts,
586+
)
587587

588588
// Run File Service Discovery and update the store set when the files are modified.
589589
if fileSD != nil {
@@ -803,7 +803,7 @@ func runQuery(
803803
infoSrv := info.NewInfoServer(
804804
component.Query.String(),
805805
info.WithLabelSetFunc(func() []labelpb.ZLabelSet { return proxy.LabelSet() }),
806-
info.WithStoreInfoFunc(func() *infopb.StoreInfo {
806+
info.WithStoreInfoFunc(func() (*infopb.StoreInfo, error) {
807807
if httpProbe.IsReady() {
808808
mint, maxt := proxy.TimeRange()
809809
return &infopb.StoreInfo{
@@ -812,9 +812,9 @@ func runQuery(
812812
SupportsSharding: true,
813813
SupportsWithoutReplicaLabels: true,
814814
TsdbInfos: proxy.TSDBInfos(),
815-
}
815+
}, nil
816816
}
817-
return nil
817+
return nil, errors.New("Not ready")
818818
}),
819819
info.WithExemplarsInfoFunc(),
820820
info.WithRulesInfoFunc(),

0 commit comments

Comments
 (0)