Releases: cortexproject/cortex
1.1.0 / 2020-05-21
This release brings the usual mix of bugfixes and improvements. The biggest change is that WAL support for chunks is now considered to be production-ready!
Please make sure to review renamed metrics, and update your dashboards and alerts accordingly.
- [CHANGE] Added v1 API routes documented in #2327. #2372
- Added
-http.alertmanager-http-prefixflag which allows the configuration of the path where the Alertmanager API and UI can be reached. The default is set to/alertmanager. - Added
-http.prometheus-http-prefixflag which allows the configuration of the path where the Prometheus API and UI can be reached. The default is set to/prometheus. - Updated the index hosted at the root prefix to point to the updated routes.
- Legacy routes hardcoded with the
/api/promprefix now respect the-http.prefixflag.
- Added
- [CHANGE] The metrics
cortex_distributor_ingester_appends_totalanddistributor_ingester_append_failures_totalnow include atypelabel to differentiate betweensamplesandmetadata. #2336 - [CHANGE] The metrics for number of chunks and bytes flushed to the chunk store are renamed. Note that previous metrics were counted pre-deduplication, while new metrics are counted after deduplication. #2463
cortex_ingester_chunks_stored_total>cortex_chunk_store_stored_chunks_totalcortex_ingester_chunk_stored_bytes_total>cortex_chunk_store_stored_chunk_bytes_total
- [CHANGE] Experimental TSDB: renamed blocks meta fetcher metrics: #2375
cortex_querier_bucket_store_blocks_meta_syncs_total>cortex_querier_blocks_meta_syncs_totalcortex_querier_bucket_store_blocks_meta_sync_failures_total>cortex_querier_blocks_meta_sync_failures_totalcortex_querier_bucket_store_blocks_meta_sync_duration_seconds>cortex_querier_blocks_meta_sync_duration_secondscortex_querier_bucket_store_blocks_meta_sync_consistency_delay_seconds>cortex_querier_blocks_meta_sync_consistency_delay_seconds
- [CHANGE] Experimental TSDB: Modified default values for
compactor.deletion-delayoption from 48h to 12h and-experimental.tsdb.bucket-store.ignore-deletion-marks-delayfrom 24h to 6h. #2414 - [CHANGE] WAL: Default value of
-ingester.checkpoint-enabledchanged totrue. #2416 - [CHANGE]
trace_idfield in log files has been renamed totraceID. #2518 - [CHANGE] Slow query log has a different output now. Previously used
urlfield has been replaced withhostandpath, and query parameters are logged as individual log fields withqs_prefix. #2520 - [CHANGE] WAL: WAL and checkpoint compression is now disabled. #2436
- [CHANGE] Update in dependency
go-kit/kitfromv0.9.0tov0.10.0. HTML escaping disabled in JSON Logger. #2535 - [CHANGE] Experimental TSDB: Removed
cortex_<service>_prefix from Thanos objstore metrics and addedcomponentlabel to distinguish which Cortex component is doing API calls to the object storage when running in single-binary mode: #2568cortex_<service>_thanos_objstore_bucket_operations_totalrenamed tothanos_objstore_bucket_operations_total{component="<name>"}cortex_<service>_thanos_objstore_bucket_operation_failures_totalrenamed tothanos_objstore_bucket_operation_failures_total{component="<name>"}cortex_<service>_thanos_objstore_bucket_operation_duration_secondsrenamed tothanos_objstore_bucket_operation_duration_seconds{component="<name>"}cortex_<service>_thanos_objstore_bucket_last_successful_upload_timerenamed tothanos_objstore_bucket_last_successful_upload_time{component="<name>"}
- [CHANGE] FIFO cache: The
-<prefix>.fifocache.sizeCLI flag has been renamed to-<prefix>.fifocache.max-size-itemsas well as its YAML config optionsizerenamed tomax_size_items. #2319 - [FEATURE] Ruler: The
-ruler.evaluation-delayflag was added to allow users to configure a default evaluation delay for all rules in cortex. The default value is 0 which is the current behavior. #2423 - [FEATURE] Experimental: Added a new object storage client for OpenStack Swift. #2440
- [FEATURE] TLS config options added to the Server. #2535
- [FEATURE] Experimental: Added support for
/api/v1/metadataPrometheus-based endpoint. #2549 - [FEATURE] Add ability to limit concurrent queries to Cassandra with
-cassandra.query-concurrencyflag. #2562 - [ENHANCEMENT] Experimental TSDB: sample ingestion errors are now reported via existing
cortex_discarded_samples_totalmetric. #2370 - [ENHANCEMENT] Failures on samples at distributors and ingesters return the first validation error as opposed to the last. #2383
- [ENHANCEMENT] Experimental TSDB: Added
cortex_querier_blocks_meta_synced, which reflects current state of synced blocks over all tenants. #2392 - [ENHANCEMENT] Added
cortex_distributor_latest_seen_sample_timestamp_secondsmetric to see how far behind Prometheus servers are in sending data. #2371 - [ENHANCEMENT] FIFO cache to support eviction based on memory usage. Added
-<prefix>.fifocache.max-size-bytesCLI flag and YAML config optionmax_size_bytesto specify memory limit of the cache. #2319, #2527 - [ENHANCEMENT] Added
-querier.worker-match-max-concurrent. Force worker concurrency to match the-querier.max-concurrentoption. Overrides-querier.worker-parallelism. #2456 - [ENHANCEMENT] Added the following metrics for monitoring delete requests: #2445
cortex_purger_delete_requests_received_total: Number of delete requests received per user.cortex_purger_delete_requests_processed_total: Number of delete requests processed per user.cortex_purger_delete_requests_chunks_selected_total: Number of chunks selected while building delete plans per user.cortex_purger_delete_requests_processing_failures_total: Number of delete requests processing failures per user.
- [ENHANCEMENT] Single Binary: Added query-frontend to the single binary. Single binary users will now benefit from various query-frontend features. Primarily: sharding, parallelization, load shedding, additional caching (if configured), and query retries. #2437
- [ENHANCEMENT] Allow 1w (where w denotes week) and 1y (where y denotes year) when setting
-store.cache-lookups-older-thanand-store.max-look-back-period. #2454 - [ENHANCEMENT] Optimize index queries for matchers using "a|b|c"-type regex. #2446 #2475
- [ENHANCEMENT] Added per tenant metrics for queries and chunks and bytes read from chunk store: #2463
cortex_chunk_store_fetched_chunks_totalandcortex_chunk_store_fetched_chunk_bytes_totalcortex_query_frontend_queries_total(per tenant queries counted by the frontend)
- [ENHANCEMENT] WAL: New metrics
cortex_ingester_wal_logged_bytes_totalandcortex_ingester_checkpoint_logged_bytes_totaladded to track total bytes logged to disk for WAL and checkpoints. #2497 - [ENHANCEMENT] Add de-duplicated chunks counter
cortex_chunk_store_deduped_chunks_totalwhich counts every chunk not sent to the store because it was already sent by another replica. #2485 - [ENHANCEMENT] Query-frontend now also logs the POST data of long queries. #2481
- [ENHANCEMENT] WAL: Ingester WAL records now have type header and the custom WAL records have been replaced by Prometheus TSDB's WAL records. Old records will not be supported from 1.3 onwards. Note: once this is deployed, you cannot downgrade without data loss. #2436
- [ENHANCEMENT] Redis Cache: Added
idle_timeout,wait_on_pool_exhaustionandmax_conn_lifetimeoptions to redis cache configuration. #2550 - [ENHANCEMENT] WAL: the experimental tag has been removed on the WAL in ingesters. #2560
- [ENHANCEMENT] Use newer AWS API for paginated queries - removes 'Deprecated' message from logfiles. #2452
- [BUGFIX] Ruler: Ensure temporary rule files with special characters are properly mapped and cleaned up. #2506
- [BUGFIX] Ensure requests are properly routed to the prometheus api embedded in the query if
-server.path-prefixis set. Fixes #2411. #2372 - [BUGFIX] Experimental TSDB: Fixed chunk data corruption when querying back series using the experimental blocks storage. #2400
- [BUGFIX] Cassandra Storage: Fix endpoint TLS host verification. #2109
- [BUGFIX] Experimental TSDB: Fixed response status code from
422to500when an error occurs while iterating chunks with the experimental blocks storage. #2402 - [BUGFIX] Ring: Fixed a situation where upgrading from pre-1.0 cortex with a rolling strategy caused new 1.0 ingesters to lose their zone value in the ring until manually forced to re-register. #2404
- [BUGFIX] Distributor:
/all_user_statsnow show API and Rule Ingest Rate correctly. #2457 - [BUGFIX] Fixed
version,revisionandbranchlabels exported by thecortex_build_infometric. #2468 - [BUGFIX] QueryFrontend: fixed a situation where span context missed when downstream_url is used. #2539
- [BUGFIX] Querier: Fixed a situation where querier would crash because of an unresponsive frontend instance. #2569
1.1.0-rc.0 / 2020-05-13
This release brings the usual mix of bugfixes and improvements. The biggest change is that WAL support for chunks is now considered to be production-ready!
Please make sure to review renamed metrics, and update your dashboards and alerts accordingly.
- [CHANGE] Added v1 API routes documented in #2327. #2372
- Added
-http.alertmanager-http-prefixflag which allows the configuration of the path where the Alertmanager API and UI can be reached. The default is set to/alertmanager. - Added
-http.prometheus-http-prefixflag which allows the configuration of the path where the Prometheus API and UI can be reached. The default is set to/prometheus. - Updated the index hosted at the root prefix to point to the updated routes.
- Legacy routes hardcoded with the
/api/promprefix now respect the-http.prefixflag.
- Added
- [CHANGE] The metrics
cortex_distributor_ingester_appends_totalanddistributor_ingester_append_failures_totalnow include atypelabel to differentiate betweensamplesandmetadata. #2336 - [CHANGE] The metrics for number of chunks and bytes flushed to the chunk store are renamed. Note that previous metrics were counted pre-deduplication, while new metrics are counted after deduplication. #2463
cortex_ingester_chunks_stored_total>cortex_chunk_store_stored_chunks_totalcortex_ingester_chunk_stored_bytes_total>cortex_chunk_store_stored_chunk_bytes_total
- [CHANGE] Experimental TSDB: renamed blocks meta fetcher metrics: #2375
cortex_querier_bucket_store_blocks_meta_syncs_total>cortex_querier_blocks_meta_syncs_totalcortex_querier_bucket_store_blocks_meta_sync_failures_total>cortex_querier_blocks_meta_sync_failures_totalcortex_querier_bucket_store_blocks_meta_sync_duration_seconds>cortex_querier_blocks_meta_sync_duration_secondscortex_querier_bucket_store_blocks_meta_sync_consistency_delay_seconds>cortex_querier_blocks_meta_sync_consistency_delay_seconds
- [CHANGE] Experimental TSDB: Modified default values for
compactor.deletion-delayoption from 48h to 12h and-experimental.tsdb.bucket-store.ignore-deletion-marks-delayfrom 24h to 6h. #2414 - [CHANGE] Experimental WAL: Default value of
-ingester.checkpoint-enabledchanged totrue. #2416 - [CHANGE]
trace_idfield in log files has been renamed totraceID. #2518 - [CHANGE] Slow query log has a different output now. Previously used
urlfield has been replaced withhostandpath, and query parameters are logged as individual log fields withqs_prefix. #2520 - [CHANGE] Experimental WAL: WAL and checkpoint compression is now disabled. #2436
- [CHANGE] Update in dependency
go-kit/kitfromv0.9.0tov0.10.0. HTML escaping disabled in JSON Logger. #2535 - [CHANGE] Experimental TSDB: Removed
cortex_<service>_prefix from Thanos objstore metrics and addedcomponentlabel to distinguish which Cortex component is doing API calls to the object storage when running in single-binary mode: #2568cortex_<service>_thanos_objstore_bucket_operations_totalrenamed tothanos_objstore_bucket_operations_total{component="<name>"}cortex_<service>_thanos_objstore_bucket_operation_failures_totalrenamed tothanos_objstore_bucket_operation_failures_total{component="<name>"}cortex_<service>_thanos_objstore_bucket_operation_duration_secondsrenamed tothanos_objstore_bucket_operation_duration_seconds{component="<name>"}cortex_<service>_thanos_objstore_bucket_last_successful_upload_timerenamed tothanos_objstore_bucket_last_successful_upload_time{component="<name>"}
- [CHANGE] FIFO cache: The
-<prefix>.fifocache.sizeCLI flag has been renamed to-<prefix>.fifocache.max-size-itemsas well as its YAML config optionsizerenamed tomax_size_items. #2319 - [FEATURE] Ruler: The
-ruler.evaluation-delayflag was added to allow users to configure a default evaluation delay for all rules in cortex. The default value is 0 which is the current behavior. #2423 - [FEATURE] Experimental: Added a new object storage client for OpenStack Swift. #2440
- [FEATURE] TLS config options added to the Server. #2535
- [FEATURE] Experimental: Added support for
/api/v1/metadataPrometheus-based endpoint. #2549 - [FEATURE] Add ability to limit concurrent queries to Cassandra with
-cassandra.query-concurrencyflag. #2562 - [ENHANCEMENT] Experimental TSDB: sample ingestion errors are now reported via existing
cortex_discarded_samples_totalmetric. #2370 - [ENHANCEMENT] Failures on samples at distributors and ingesters return the first validation error as opposed to the last. #2383
- [ENHANCEMENT] Experimental TSDB: Added
cortex_querier_blocks_meta_synced, which reflects current state of synced blocks over all tenants. #2392 - [ENHANCEMENT] Added
cortex_distributor_latest_seen_sample_timestamp_secondsmetric to see how far behind Prometheus servers are in sending data. #2371 - [ENHANCEMENT] FIFO cache to support eviction based on memory usage. Added
-<prefix>.fifocache.max-size-bytesCLI flag and YAML config optionmax_size_bytesto specify memory limit of the cache. #2319, #2527 - [ENHANCEMENT] Added
-querier.worker-match-max-concurrent. Force worker concurrency to match the-querier.max-concurrentoption. Overrides-querier.worker-parallelism. #2456 - [ENHANCEMENT] Added the following metrics for monitoring delete requests: #2445
cortex_purger_delete_requests_received_total: Number of delete requests received per user.cortex_purger_delete_requests_processed_total: Number of delete requests processed per user.cortex_purger_delete_requests_chunks_selected_total: Number of chunks selected while building delete plans per user.cortex_purger_delete_requests_processing_failures_total: Number of delete requests processing failures per user.
- [ENHANCEMENT] Single Binary: Added query-frontend to the single binary. Single binary users will now benefit from various query-frontend features. Primarily: sharding, parallelization, load shedding, additional caching (if configured), and query retries. #2437
- [ENHANCEMENT] Allow 1w (where w denotes week) and 1y (where y denotes year) when setting
-store.cache-lookups-older-thanand-store.max-look-back-period. #2454 - [ENHANCEMENT] Optimize index queries for matchers using "a|b|c"-type regex. #2446 #2475
- [ENHANCEMENT] Added per tenant metrics for queries and chunks and bytes read from chunk store: #2463
cortex_chunk_store_fetched_chunks_totalandcortex_chunk_store_fetched_chunk_bytes_totalcortex_query_frontend_queries_total(per tenant queries counted by the frontend)
- [ENHANCEMENT] WAL: New metrics
cortex_ingester_wal_logged_bytes_totalandcortex_ingester_checkpoint_logged_bytes_totaladded to track total bytes logged to disk for WAL and checkpoints. #2497 - [ENHANCEMENT] Add de-duplicated chunks counter
cortex_chunk_store_deduped_chunks_totalwhich counts every chunk not sent to the store because it was already sent by another replica. #2485 - [ENHANCEMENT] Query-frontend now also logs the POST data of long queries. #2481
- [ENHANCEMENT] WAL: Ingester WAL records now have type header and the custom WAL records have been replaced by Prometheus TSDB's WAL records. Old records will not be supported from 1.3 onwards. Note: once this is deployed, you cannot downgrade without data loss. #2436
- [ENHANCEMENT] Redis Cache: Added
idle_timeout,wait_on_pool_exhaustionandmax_conn_lifetimeoptions to redis cache configuration. #2550 - [ENHANCEMENT] WAL: the experimental tag has been removed on the WAL in ingesters. #2560
- [ENHANCEMENT] Use newer AWS API for paginated queries - removes 'Deprecated' message from logfiles. #2452
- [BUGFIX] Ruler: Ensure temporary rule files with special characters are properly mapped and cleaned up. #2506
- [BUGFIX] Ensure requests are properly routed to the prometheus api embedded in the query if
-server.path-prefixis set. Fixes #2411. #2372 - [BUGFIX] Experimental TSDB: Fixed chunk data corruption when querying back series using the experimental blocks storage. #2400
- [BUGFIX] Cassandra Storage: Fix endpoint TLS host verification. #2109
- [BUGFIX] Experimental TSDB: Fixed response status code from
422to500when an error occurs while iterating chunks with the experimental blocks storage. #2402 - [BUGFIX] Ring: Fixed a situation where upgrading from pre-1.0 cortex with a rolling strategy caused new 1.0 ingesters to lose their zone value in the ring until manually forced to re-register. #2404
- [BUGFIX] Distributor:
/all_user_statsnow show API and Rule Ingest Rate correctly. #2457 - [BUGFIX] Fixed
version,revisionandbranchlabels exported by thecortex_build_infometric. #2468 - [BUGFIX] QueryFrontend: fixed a situation where HTTP error is ignored and an incorrect status code is set. #2483
- [BUGFIX] QueryFrontend: fixed a situation where span context missed when downstream_url is used. #2539
- [BUGFIX] Querier: Fixed a situation where querier would crash because of an unresponsive frontend instance. #2569
1.0.1 / 2020-04-23
In a cluster with 3 ingester replicas, when rollouts happen or when there are only 2 replicas available, you might see gaps in your queries. This release fixes that bug.
- [BUGFIX] Fix gaps when querying ingesters with replication factor = 3 and 2 ingesters in the cluster. #2503
1.0.0 / 2020-04-02
This is the first major release of Cortex. We made a lot of breaking changes in this release which have been detailed below. Please also see the stability guarantees we provide as part of a major release: https://cortexmetrics.io/docs/configuration/v1guarantees/.
-
[CHANGE] Remove the following deprecated flags: #2339
-metrics.error-rate-query(use-metrics.write-throttle-queryinstead).-store.cardinality-cache-size(use-store.index-cache-read.enable-fifocacheand-store.index-cache-read.fifocache.sizeinstead).-store.cardinality-cache-validity(use-store.index-cache-read.enable-fifocacheand-store.index-cache-read.fifocache.durationinstead).-distributor.limiter-reload-period(flag unused)-ingester.claim-on-rollout(flag unused)-ingester.normalise-tokens(flag unused)
-
[CHANGE] Renamed YAML file options to be more consistent. See full config file changes below. #2273
-
[CHANGE] AWS based autoscaling has been removed. You can only use metrics based autoscaling now.
-applicationautoscaling.urlhas been removed. See https://cortexmetrics.io/docs/guides/aws/#dynamodb-capacity-provisioning on how to migrate. #2328 -
[CHANGE] Renamed the
memcache.write-back-goroutinesandmemcache.write-back-bufferflags tobackground.write-back-concurrencyandbackground.write-back-buffer. This affects the following flags: #2241-frontend.memcache.write-back-buffer-->-frontend.background.write-back-buffer-frontend.memcache.write-back-goroutines-->-frontend.background.write-back-concurrency-store.index-cache-read.memcache.write-back-buffer-->-store.index-cache-read.background.write-back-buffer-store.index-cache-read.memcache.write-back-goroutines-->-store.index-cache-read.background.write-back-concurrency-store.index-cache-write.memcache.write-back-buffer-->-store.index-cache-write.background.write-back-buffer-store.index-cache-write.memcache.write-back-goroutines-->-store.index-cache-write.background.write-back-concurrency-memcache.write-back-buffer-->-store.chunks-cache.background.write-back-buffer. Note the next change log for the difference.-memcache.write-back-goroutines-->-store.chunks-cache.background.write-back-concurrency. Note the next change log for the difference.
-
[CHANGE] Renamed the chunk cache flags to have
store.chunks-cache.as prefix. This means the following flags have been changed: #2241-cache.enable-fifocache-->-store.chunks-cache.cache.enable-fifocache-default-validity-->-store.chunks-cache.default-validity-fifocache.duration-->-store.chunks-cache.fifocache.duration-fifocache.size-->-store.chunks-cache.fifocache.size-memcache.write-back-buffer-->-store.chunks-cache.background.write-back-buffer. Note the previous change log for the difference.-memcache.write-back-goroutines-->-store.chunks-cache.background.write-back-concurrency. Note the previous change log for the difference.-memcached.batchsize-->-store.chunks-cache.memcached.batchsize-memcached.consistent-hash-->-store.chunks-cache.memcached.consistent-hash-memcached.expiration-->-store.chunks-cache.memcached.expiration-memcached.hostname-->-store.chunks-cache.memcached.hostname-memcached.max-idle-conns-->-store.chunks-cache.memcached.max-idle-conns-memcached.parallelism-->-store.chunks-cache.memcached.parallelism-memcached.service-->-store.chunks-cache.memcached.service-memcached.timeout-->-store.chunks-cache.memcached.timeout-memcached.update-interval-->-store.chunks-cache.memcached.update-interval-redis.enable-tls-->-store.chunks-cache.redis.enable-tls-redis.endpoint-->-store.chunks-cache.redis.endpoint-redis.expiration-->-store.chunks-cache.redis.expiration-redis.max-active-conns-->-store.chunks-cache.redis.max-active-conns-redis.max-idle-conns-->-store.chunks-cache.redis.max-idle-conns-redis.password-->-store.chunks-cache.redis.password-redis.timeout-->-store.chunks-cache.redis.timeout
-
[CHANGE] Rename the
-store.chunk-cache-stubsto-store.chunks-cache.cache-stubsto be more inline with above. #2241 -
[CHANGE] Change prefix of flags
-dynamodb.periodic-table.*to-table-manager.index-table.*. #2359 -
[CHANGE] Change prefix of flags
-dynamodb.chunk-table.*to-table-manager.chunk-table.*. #2359 -
[CHANGE] Change the following flags: #2359
-dynamodb.poll-interval-->-table-manager.poll-interval-dynamodb.periodic-table.grace-period-->-table-manager.periodic-table.grace-period
-
[CHANGE] Renamed the following flags: #2273
-dynamodb.chunk.gang.size-->-dynamodb.chunk-gang-size-dynamodb.chunk.get.max.parallelism-->-dynamodb.chunk-get-max-parallelism
-
[CHANGE] Don't support mixed time units anymore for duration. For example, 168h5m0s doesn't work anymore, please use just one unit (s|m|h|d|w|y). #2252
-
[CHANGE] Utilize separate protos for rule state and storage. Experimental ruler API will not be functional until the rollout is complete. #2226
-
[CHANGE] Frontend worker in querier now starts after all Querier module dependencies are started. This fixes issue where frontend worker started to send queries to querier before it was ready to serve them (mostly visible when using experimental blocks storage). #2246
-
[CHANGE] Lifecycler component now enters Failed state on errors, and doesn't exit the process. (Important if you're vendoring Cortex and use Lifecycler) #2251
-
[CHANGE]
/readyhandler now returns 200 instead of 204. #2330 -
[CHANGE] Better defaults for the following options: #2344
-<prefix>.consul.consistent-reads: Old default:true, new default:false. This reduces the load on Consul.-<prefix>.consul.watch-rate-limit: Old default: 0, new default: 1. This rate limits the reads to 1 per second. Which is good enough for ring watches.-distributor.health-check-ingesters: Old default:false, new default:true.-ingester.max-stale-chunk-idle: Old default: 0, new default: 2m. This lets us expire series that we know are stale early.-ingester.spread-flushes: Old default: false, new default: true. This allows to better de-duplicate data and use less space.-ingester.chunk-age-jitter: Old default: 20mins, new default: 0. This is to enable the-ingester.spread-flushesto true.-<prefix>.memcached.batchsize: Old default: 0, new default: 1024. This allows batching of requests and keeps the concurrent requests low.-<prefix>.memcached.consistent-hash: Old default: false, new default: true. This allows for better cache hits when the memcaches are scaled up and down.-querier.batch-iterators: Old default: false, new default: true.-querier.ingester-streaming: Old default: false, new default: true.
-
[CHANGE] Experimental TSDB: Added
-experimental.tsdb.bucket-store.postings-cache-compression-enabledto enable postings compression when storing to cache. #2335 -
[CHANGE] Experimental TSDB: Added
-compactor.deletion-delay, which is time before a block marked for deletion is deleted from bucket. If not 0, blocks will be marked for deletion and compactor component will delete blocks marked for deletion from the bucket. If delete-delay is 0, blocks will be deleted straight away. Note that deleting blocks immediately can cause query failures, if store gateway / querier still has the block loaded, or compactor is ignoring the deletion because it's compacting the block at the same time. Default value is 48h. #2335 -
[CHANGE] Experimental TSDB: Added
-experimental.tsdb.bucket-store.index-cache.postings-compression-enabled, to set duration after which the blocks marked for deletion will be filtered out while fetching blocks used for querying. This option allows querier to ignore blocks that are marked for deletion with some delay. This ensures store can still serve blocks that are meant to be deleted but do not have a replacement yet. Default is 24h, half of the default value for-compactor.deletion-delay. #2335 -
[CHANGE] Experimental TSDB: Added
-experimental.tsdb.bucket-store.index-cache.memcached.max-item-sizeto control maximum size of item that is stored to memcached. Defaults to 1 MiB. #2335 -
[FEATURE] Added experimental storage API to the ruler service that is enabled when the
-experimental.ruler.enable-apiis set to true #2269-ruler.storage.typeflag now allowss3,gcs, andazurevalues-ruler.storage.(s3|gcs|azure)flags exist to allow the configuration of object clients set for rule storage
-
[CHANGE] Renamed table manager metrics. #2307 #2359
cortex_dynamo_sync_tables_seconds->cortex_table_manager_sync_duration_secondscortex_dynamo_table_capacity_units->cortex_table_capacity_units
-
[FEATURE] Flusher target to flush the WAL. #2075
-flusher.wal-dirfor the WAL directory to recover from.-flusher.concurrent-flushesfor number of concurrent flushes.-flusher.flush-op-timeoutis duration after which a flush should timeout.
-
[FEATURE] Ingesters can now have an optional availability zone set, to ensure metric replication is distributed across zones. This is set via the
-ingester.availability-zoneflag or theavailability_zonefield in the config file. #2317 -
[ENHANCEMENT] Better re-use of connections to DynamoDB and S3. #2268
-
[ENHANCEMENT] Experimental TSDB: Add support for local
filesystembackend. #2245 -
[ENHANCEMENT] Experimental TSDB: Added memcached support for the TSDB index cache. #2290
-
[ENHANCEMENT] Experimental TSDB: Removed gRPC server to communicate between querier and BucketStore. #2324
-
[ENHANCEMENT] Allow 1w (where w denotes week) ...
1.0.0-rc.0 / 2020-03-31
This is the first major release of Cortex. We made a lot of breaking changes in this release which have been detailed below. Please also see the stability guarantees we provide as part of a major release: https://cortexmetrics.io/docs/configuration/v1guarantees/.
-
[CHANGE] Remove the following deprecated flags: #2339
-metrics.error-rate-query(use-metrics.write-throttle-queryinstead).-store.cardinality-cache-size(use-store.index-cache-read.enable-fifocacheand-store.index-cache-read.fifocache.sizeinstead).-store.cardinality-cache-validity(use-store.index-cache-read.enable-fifocacheand-store.index-cache-read.fifocache.durationinstead).-distributor.limiter-reload-period(flag unused)-ingester.claim-on-rollout(flag unused)-ingester.normalise-tokens(flag unused)
-
[CHANGE] Renamed YAML file options to be more consistent. See full config file changes below. #2273
-
[CHANGE] AWS based autoscaling has been removed. You can only use metrics based autoscaling now.
-applicationautoscaling.urlhas been removed. See https://cortexmetrics.io/docs/guides/aws/#dynamodb-capacity-provisioning on how to migrate. #2328 -
[CHANGE] Renamed the
memcache.write-back-goroutinesandmemcache.write-back-bufferflags tobackground.write-back-concurrencyandbackground.write-back-buffer. This affects the following flags: #2241-frontend.memcache.write-back-buffer-->-frontend.background.write-back-buffer-frontend.memcache.write-back-goroutines-->-frontend.background.write-back-concurrency-store.index-cache-read.memcache.write-back-buffer-->-store.index-cache-read.background.write-back-buffer-store.index-cache-read.memcache.write-back-goroutines-->-store.index-cache-read.background.write-back-concurrency-store.index-cache-write.memcache.write-back-buffer-->-store.index-cache-write.background.write-back-buffer-store.index-cache-write.memcache.write-back-goroutines-->-store.index-cache-write.background.write-back-concurrency-memcache.write-back-buffer-->-store.chunks-cache.background.write-back-buffer. Note the next change log for the difference.-memcache.write-back-goroutines-->-store.chunks-cache.background.write-back-concurrency. Note the next change log for the difference.
-
[CHANGE] Renamed the chunk cache flags to have
store.chunks-cache.as prefix. This means the following flags have been changed: #2241-cache.enable-fifocache-->-store.chunks-cache.cache.enable-fifocache-default-validity-->-store.chunks-cache.default-validity-fifocache.duration-->-store.chunks-cache.fifocache.duration-fifocache.size-->-store.chunks-cache.fifocache.size-memcache.write-back-buffer-->-store.chunks-cache.background.write-back-buffer. Note the previous change log for the difference.-memcache.write-back-goroutines-->-store.chunks-cache.background.write-back-concurrency. Note the previous change log for the difference.-memcached.batchsize-->-store.chunks-cache.memcached.batchsize-memcached.consistent-hash-->-store.chunks-cache.memcached.consistent-hash-memcached.expiration-->-store.chunks-cache.memcached.expiration-memcached.hostname-->-store.chunks-cache.memcached.hostname-memcached.max-idle-conns-->-store.chunks-cache.memcached.max-idle-conns-memcached.parallelism-->-store.chunks-cache.memcached.parallelism-memcached.service-->-store.chunks-cache.memcached.service-memcached.timeout-->-store.chunks-cache.memcached.timeout-memcached.update-interval-->-store.chunks-cache.memcached.update-interval-redis.enable-tls-->-store.chunks-cache.redis.enable-tls-redis.endpoint-->-store.chunks-cache.redis.endpoint-redis.expiration-->-store.chunks-cache.redis.expiration-redis.max-active-conns-->-store.chunks-cache.redis.max-active-conns-redis.max-idle-conns-->-store.chunks-cache.redis.max-idle-conns-redis.password-->-store.chunks-cache.redis.password-redis.timeout-->-store.chunks-cache.redis.timeout
-
[CHANGE] Rename the
-store.chunk-cache-stubsto-store.chunks-cache.cache-stubsto be more inline with above. #2241 -
[CHANGE] Change prefix of flags
-dynamodb.periodic-table.*to-table-manager.index-table.*. #2359 -
[CHANGE] Change prefix of flags
-dynamodb.chunk-table.*to-table-manager.chunk-table.*. #2359 -
[CHANGE] Change the following flags: #2359
-dynamodb.poll-interval-->-table-manager.poll-interval-dynamodb.periodic-table.grace-period-->-table-manager.periodic-table.grace-period
-
[CHANGE] Renamed the following flags: #2273
-dynamodb.chunk.gang.size-->-dynamodb.chunk-gang-size-dynamodb.chunk.get.max.parallelism-->-dynamodb.chunk-get-max-parallelism
-
[CHANGE] Don't support mixed time units anymore for duration. For example, 168h5m0s doesn't work anymore, please use just one unit (s|m|h|d|w|y). #2252
-
[CHANGE] Utilize separate protos for rule state and storage. Experimental ruler API will not be functional until the rollout is complete. #2226
-
[CHANGE] Frontend worker in querier now starts after all Querier module dependencies are started. This fixes issue where frontend worker started to send queries to querier before it was ready to serve them (mostly visible when using experimental blocks storage). #2246
-
[CHANGE] Lifecycler component now enters Failed state on errors, and doesn't exit the process. (Important if you're vendoring Cortex and use Lifecycler) #2251
-
[CHANGE]
/readyhandler now returns 200 instead of 204. #2330 -
[CHANGE] Better defaults for the following options: #2344
-<prefix>.consul.consistent-reads: Old default:true, new default:false. This reduces the load on Consul.-<prefix>.consul.watch-rate-limit: Old default: 0, new default: 1. This rate limits the reads to 1 per second. Which is good enough for ring watches.-distributor.health-check-ingesters: Old default:false, new default:true.-ingester.max-stale-chunk-idle: Old default: 0, new default: 2m. This lets us expire series that we know are stale early.-ingester.spread-flushes: Old default: false, new default: true. This allows to better de-duplicate data and use less space.-ingester.chunk-age-jitter: Old default: 20mins, new default: 0. This is to enable the-ingester.spread-flushesto true.-<prefix>.memcached.batchsize: Old default: 0, new default: 1024. This allows batching of requests and keeps the concurrent requests low.-<prefix>.memcached.consistent-hash: Old default: false, new default: true. This allows for better cache hits when the memcaches are scaled up and down.-querier.batch-iterators: Old default: false, new default: true.-querier.ingester-streaming: Old default: false, new default: true.
-
[CHANGE] Experimental TSDB: Added
-experimental.tsdb.bucket-store.postings-cache-compression-enabledto enable postings compression when storing to cache. #2335 -
[CHANGE] Experimental TSDB: Added
-compactor.deletion-delay, which is time before a block marked for deletion is deleted from bucket. If not 0, blocks will be marked for deletion and compactor component will delete blocks marked for deletion from the bucket. If delete-delay is 0, blocks will be deleted straight away. Note that deleting blocks immediately can cause query failures, if store gateway / querier still has the block loaded, or compactor is ignoring the deletion because it's compacting the block at the same time. Default value is 48h. #2335 -
[CHANGE] Experimental TSDB: Added
-experimental.tsdb.bucket-store.index-cache.postings-compression-enabled, to set duration after which the blocks marked for deletion will be filtered out while fetching blocks used for querying. This option allows querier to ignore blocks that are marked for deletion with some delay. This ensures store can still serve blocks that are meant to be deleted but do not have a replacement yet. Default is 24h, half of the default value for-compactor.deletion-delay. #2335 -
[CHANGE] Experimental TSDB: Added
-experimental.tsdb.bucket-store.index-cache.memcached.max-item-sizeto control maximum size of item that is stored to memcached. Defaults to 1 MiB. #2335 -
[FEATURE] Added experimental storage API to the ruler service that is enabled when the
-experimental.ruler.enable-apiis set to true #2269-ruler.storage.typeflag now allowss3,gcs, andazurevalues-ruler.storage.(s3|gcs|azure)flags exist to allow the configuration of object clients set for rule storage
-
[CHANGE] Renamed table manager metrics. #2307 #2359
cortex_dynamo_sync_tables_seconds->cortex_table_manager_sync_duration_secondscortex_dynamo_table_capacity_units->cortex_table_capacity_units
-
[FEATURE] Flusher target to flush the WAL. #2075
-flusher.wal-dirfor the WAL directory to recover from.-flusher.concurrent-flushesfor number of concurrent flushes.-flusher.flush-op-timeoutis duration after which a flush should timeout.
-
[FEATURE] Ingesters can now have an optional availability zone set, to ensure metric replication is distributed across zones. This is set via the
-ingester.availability-zoneflag or theavailability_zonefield in the config file. #2317 -
[ENHANCEMENT] Better re-use of connections to DynamoDB and S3. #2268
-
[ENHANCEMENT] Experimental TSDB: Add support for local
filesystembackend. #2245 -
[ENHANCEMENT] Experimental TSDB: Added memcached support for the TSDB index cache. #2290
-
[ENHANCEMENT] Experimental TSDB: Removed gRPC server to communicate between querier and BucketStore. #2324
-
[ENHANCEMENT] Allow 1w (where w denotes week) ...
0.7.0 / 2020-03-16
Cortex 0.7.0 is a major step forward the upcoming 1.0 release.
In this release, we've got 164 contributions from 26 authors.
Thanks to all contributors! ❤️
Please be aware that Cortex 0.7.0 introduces some breaking changes. You're encouraged to read all the [CHANGE] entries below before upgrading your Cortex cluster. In particular:
- Cleaned up some configuration options in preparation for the Cortex
1.0.0release (see also the annotated config file breaking changes below):- Removed CLI flags support to configure the schema (see how to migrate from flags to schema file)
- Renamed CLI flag
-config-yamlto-schema-config-file - Removed CLI flag
-store.min-chunk-agein favor of-querier.query-store-after. The corresponding YAML config optioningestermaxquerylookbackhas been renamed toquery_ingesters_within - Deprecated CLI flag
-frontend.cache-split-intervalin favor of-querier.split-queries-by-interval - Renamed the YAML config option
defaul_validitytodefault_validity - Removed the YAML config option
config_store(in thealertmanager YAML config) in favor ofstore - Removed the YAML config root block
configdbin favor ofconfigs. This change is also reflected in the following CLI flags renaming:-database.*->-configs.database.*-database.migrations->-configs.database.migrations-dir
- Removed the fluentd-based billing infrastructure including the CLI flags:
-distributor.enable-billing-billing.max-buffered-events-billing.retry-delay-billing.ingester
- Removed support for using denormalised tokens in the ring. Before upgrading, make sure your Cortex cluster is already running
v0.6.0or an earlier version with-ingester.normalise-tokens=true
Full changelog
- [CHANGE] Removed support for flags to configure schema. Further, the flag for specifying the config file (
-config-yaml) has been deprecated. Please use-schema-config-file. See the Schema Configuration documentation for more details on how to configure the schema using the YAML file. #2221 - [CHANGE] In the config file, the root level
config_storeconfig option has been moved toalertmanager>store>configdb. #2125 - [CHANGE] Removed unnecessary
frontend.cache-split-intervalin favor ofquerier.split-queries-by-intervalboth to reduce configuration complexity and guarantee alignment of these two configs. Starting from now,-querier.cache-resultsmay only be enabled in conjunction with-querier.split-queries-by-interval(previously the cache interval default was24hso if you want to preserve the same behaviour you should set-querier.split-queries-by-interval=24h). #2040 - [CHANGE] Renamed Configs configuration options. #2187
- configuration options
-database.*->-configs.database.*-database.migrations->-configs.database.migrations-dir
- config file
configdb.uri:->configs.database.uri:configdb.migrationsdir:->configs.database.migrations_dir:configdb.passwordfile:->configs.database.password_file:
- configuration options
- [CHANGE] Moved
-store.min-chunk-ageto the Querier config as-querier.query-store-after, allowing the store to be skipped during query time if the metrics wouldn't be found. The YAML config optioningestermaxquerylookbackhas been renamed toquery_ingesters_withinto match its CLI flag. #1893 - [CHANGE] Renamed the cache configuration setting
defaul_validitytodefault_validity. #2140 - [CHANGE] Remove fluentd-based billing infrastructure and flags such as
-distributor.enable-billing. #1491 - [CHANGE] Removed remaining support for using denormalised tokens in the ring. If you're still running ingesters with denormalised tokens (Cortex 0.4 or earlier, with
-ingester.normalise-tokens=false), such ingesters will now be completely invisible to distributors and need to be either switched to Cortex 0.6.0 or later, or be configured to use normalised tokens. #2034 - [CHANGE] The frontend http server will now send 502 in case of deadline exceeded and 499 if the user requested cancellation. #2156
- [CHANGE] We now enforce queries to be up to
-querier.max-query-into-futureinto the future (defaults to 10m). #1929-store.min-chunk-agehas been removed-querier.query-store-afterhas been added in it's place.
- [CHANGE] Removed unused
/validate_expr endpoint. #2152 - [CHANGE] Updated Prometheus dependency to v2.16.0. This Prometheus version uses Active Query Tracker to limit concurrent queries. In order to keep
-querier.max-concurrentworking, Active Query Tracker is enabled by default, and is configured to store its data toactive-query-trackerdirectory (relative to current directory when Cortex started). This can be changed by using-querier.active-query-tracker-diroption. Purpose of Active Query Tracker is to log queries that were running when Cortex crashes. This logging happens on next Cortex start. #2088 - [CHANGE] Default to BigChunk encoding; may result in slightly higher disk usage if many timeseries have a constant value, but should generally result in fewer, bigger chunks. #2207
- [CHANGE] WAL replays are now done while the rest of Cortex is starting, and more specifically, when HTTP server is running. This makes it possible to scrape metrics during WAL replays. Applies to both chunks and experimental blocks storage. #2222
- [CHANGE] Cortex now has
/readyprobe for all services, not just ingester and querier as before. In single-binary mode, /ready reports 204 only if all components are running properly. #2166 - [CHANGE] If you are vendoring Cortex and use its components in your project, be aware that many Cortex components no longer start automatically when they are created. You may want to review PR and attached document. #2166
- [CHANGE] Experimental TSDB: the querier in-memory index cache used by the experimental blocks storage shifted from per-tenant to per-querier. The
-experimental.tsdb.bucket-store.index-cache-size-bytesnow configures the per-querier index cache max size instead of a per-tenant cache and its default has been increased to 1GB. #2189 - [CHANGE] Experimental TSDB: TSDB head compaction interval and concurrency is now configurable (defaults to 1 min interval and 5 concurrent head compactions). New options:
-experimental.tsdb.head-compaction-intervaland-experimental.tsdb.head-compaction-concurrency. #2172 - [CHANGE] Experimental TSDB: switched the blocks storage index header to the binary format. This change is expected to have no visible impact, except lower startup times and memory usage in the queriers. It's possible to switch back to the old JSON format via the flag
-experimental.tsdb.bucket-store.binary-index-header-enabled=false. #2223 - [CHANGE] Experimental Memberlist KV store can now be used in single-binary Cortex. Attempts to use it previously would fail with panic. This change also breaks existing binary protocol used to exchange gossip messages, so this version will not be able to understand gossiped Ring when used in combination with the previous version of Cortex. Easiest way to upgrade is to shutdown old Cortex installation, and restart it with new version. Incremental rollout works too, but with reduced functionality until all components run the same version. #2016
- [FEATURE] Added a read-only local alertmanager config store using files named corresponding to their tenant id. #2125
- [FEATURE] Added flag
-experimental.ruler.enable-apito enable the ruler api which implements the Prometheus API/api/v1/rulesand/api/v1/alertsendpoints under the configured-http.prefix. #1999 - [FEATURE] Added sharding support to compactor when using the experimental TSDB blocks storage. #2113
- [FEATURE] Added ability to override YAML config file settings using environment variables. #2147
-config.expand-env
- [FEATURE] Added flags to disable Alertmanager notifications methods. #2187
-configs.notifications.disable-email-configs.notifications.disable-webhook
- [FEATURE] Add /config HTTP endpoint which exposes the current Cortex configuration as YAML. #2165
- [FEATURE] Allow Prometheus remote write directly to ingesters. #1491
- [FEATURE] Introduced new standalone service
query-teethat can be used for testing purposes to send the same Prometheus query to multiple backends (ie. two Cortex clusters ingesting the same metrics) and compare the performances. #2203 - [FEATURE] Fan out parallelizable queries to backend queriers concurrently. #1878
querier.parallelise-shardable-queries(bool)- Requires a shard-compatible schema (v10+)
- This causes the number of traces to increase accordingly.
- The query-frontend now requires a schema config to determine how/when to shard queries, either from a file or from flags (i.e. by the
config-yamlCLI flag). This is the same schema config the queriers consume. The schema is only required to use this option. - It's also advised to increase downstream concurrency controls as well:
querier.max-outstanding-requests-per-tenantquerier.max-query-parallelismquerier.max-concurrentserver.grpc-max-concurrent-streams(for both query-frontends and queriers)
- [FEATURE] Added user sub rings to distribute users to a subset of ingesters. #1947
-experimental.distributor.user-subring-size
- [FEATURE] Add flag `-experimenta...
0.7.0-rc.0 / 2020-03-09
Cortex 0.7.0 introduces some breaking changes. You're encouraged to read all the [CHANGE] entries below before upgrading your Cortex cluster. In particular:
- Cleaned up some configuration options in preparation for the Cortex
1.0.0release:- Removed CLI flags support to configure the schema (see how to migrate from flags to schema file)
- Renamed CLI flag
-config-yamlto-schema-config-file - Removed CLI flag
-store.min-chunk-agein favor of-querier.query-store-after. The corresponding YAML config optioningestermaxquerylookbackhas been renamed toquery_ingesters_within - Deprecated CLI flag
-frontend.cache-split-intervalin favor of-querier.split-queries-by-interval - Renamed the YAML config option
defaul_validitytodefault_validity - Removed the YAML config option
config_store(in thealertmanager YAML config) in favor ofstore - Removed the YAML config root block
configdbin favor ofconfigs. This change is also reflected in the following CLI flags renaming:-database.*->-configs.database.*-database.migrations->-configs.database.migrations-dir
- Removed the fluentd-based billing infrastructure including the CLI flags:
-distributor.enable-billing-billing.max-buffered-events-billing.retry-delay-billing.ingester
- Removed support for using denormalised tokens in the ring. Before upgrading, make sure your Cortex cluster is already running
v0.6.0or an earlier version with-ingester.normalise-tokens=true
Full changelog
- [CHANGE] Removed support for flags to configure schema. Further, the flag for specifying the config file (
-config-yaml) has been deprecated. Please use-schema-config-file. See the Schema Configuration documentation for more details on how to configure the schema using the YAML file. #2221 - [CHANGE] Config file changed to remove top level
config_storefield in favor of a nestedconfigdbfield. #2125 - [CHANGE] Removed unnecessary
frontend.cache-split-intervalin favor ofquerier.split-queries-by-intervalboth to reduce configuration complexity and guarantee alignment of these two configs. Starting from now,-querier.cache-resultsmay only be enabled in conjunction with-querier.split-queries-by-interval(previously the cache interval default was24hso if you want to preserve the same behaviour you should set-querier.split-queries-by-interval=24h). #2040 - [CHANGE] Renamed Configs configuration options. #2187
- configuration options
-database.*->-configs.database.*-database.migrations->-configs.database.migrations-dir
- config file
configdb.uri:->configs.database.uri:configdb.migrationsdir:->configs.database.migrations_dir:configdb.passwordfile:->configs.database.password_file:
- configuration options
- [CHANGE] Moved
-store.min-chunk-ageto the Querier config as-querier.query-store-after, allowing the store to be skipped during query time if the metrics wouldn't be found. The YAML config optioningestermaxquerylookbackhas been renamed toquery_ingesters_withinto match its CLI flag. #1893 - [CHANGE] Renamed the cache configuration setting
defaul_validitytodefault_validity. #2140 - [CHANGE] Remove fluentd-based billing infrastructure and flags such as
-distributor.enable-billing. #1491 - [CHANGE] Removed remaining support for using denormalised tokens in the ring. If you're still running ingesters with denormalised tokens (Cortex 0.4 or earlier, with
-ingester.normalise-tokens=false), such ingesters will now be completely invisible to distributors and need to be either switched to Cortex 0.6.0 or later, or be configured to use normalised tokens. #2034 - [CHANGE] The frontend http server will now send 502 in case of deadline exceeded and 499 if the user requested cancellation. #2156
- [CHANGE] We now enforce queries to be up to
-querier.max-query-into-futureinto the future (defaults to 10m). #1929-store.min-chunk-agehas been removed-querier.query-store-afterhas been added in it's place.
- [CHANGE] Removed unused
/validate_expr endpoint. #2152 - [CHANGE] Updated Prometheus dependency to v2.16.0. This Prometheus version uses Active Query Tracker to limit concurrent queries. In order to keep
-querier.max-concurrentworking, Active Query Tracker is enabled by default, and is configured to store its data toactive-query-trackerdirectory (relative to current directory when Cortex started). This can be changed by using-querier.active-query-tracker-diroption. Purpose of Active Query Tracker is to log queries that were running when Cortex crashes. This logging happens on next Cortex start. #2088 - [CHANGE] Default to BigChunk encoding; may result in slightly higher disk usage if many timeseries have a constant value, but should generally result in fewer, bigger chunks. #2207
- [CHANGE] WAL replays are now done while the rest of Cortex is starting, and more specifically, when HTTP server is running. This makes it possible to scrape metrics during WAL replays. Applies to both chunks and experimental blocks storage. #2222
- [CHANGE] Cortex now has
/readyprobe for all services, not just ingester and querier as before. In single-binary mode, /ready reports 204 only if all components are running properly. #2166 - [CHANGE] If you are vendoring Cortex and use its components in your project, be aware that many Cortex components no longer start automatically when they are created. You may want to review PR and attached document. #2166
- [CHANGE] Experimental TSDB: the querier in-memory index cache used by the experimental blocks storage shifted from per-tenant to per-querier. The
-experimental.tsdb.bucket-store.index-cache-size-bytesnow configures the per-querier index cache max size instead of a per-tenant cache and its default has been increased to 1GB. #2189 - [CHANGE] Experimental TSDB: TSDB head compaction interval and concurrency is now configurable (defaults to 1 min interval and 5 concurrent head compactions). New options:
-experimental.tsdb.head-compaction-intervaland-experimental.tsdb.head-compaction-concurrency. #2172 - [CHANGE] Experimental TSDB: switched the blocks storage index header to the binary format. This change is expected to have no visible impact, except lower startup times and memory usage in the queriers. It's possible to switch back to the old JSON format via the flag
-experimental.tsdb.bucket-store.binary-index-header-enabled=false. #2223 - [CHANGE] Experimental Memberlist KV store can now be used in single-binary Cortex. Attempts to use it previously would fail with panic. This change also breaks existing binary protocol used to exchange gossip messages, so this version will not be able to understand gossiped Ring when used in combination with the previous version of Cortex. Easiest way to upgrade is to shutdown old Cortex installation, and restart it with new version. Incremental rollout works too, but with reduced functionality until all components run the same version. #2016
- [FEATURE] Added a read-only local alertmanager config store using files named corresponding to their tenant id. #2125
- [FEATURE] Added flag
-experimental.ruler.enable-apito enable the ruler api which implements the Prometheus API/api/v1/rulesand/api/v1/alertsendpoints under the configured-http.prefix. #1999 - [FEATURE] Added sharding support to compactor when using the experimental TSDB blocks storage. #2113
- [FEATURE] Added ability to override YAML config file settings using environment variables. #2147
-config.expand-env
- [FEATURE] Added flags to disable Alertmanager notifications methods. #2187
-configs.notifications.disable-email-configs.notifications.disable-webhook
- [FEATURE] Add /config HTTP endpoint which exposes the current Cortex configuration as YAML. #2165
- [FEATURE] Allow Prometheus remote write directly to ingesters. #1491
- [FEATURE] Introduced new standalone service
query-teethat can be used for testing purposes to send the same Prometheus query to multiple backends (ie. two Cortex clusters ingesting the same metrics) and compare the performances. #2203 - [FEATURE] Fan out parallelizable queries to backend queriers concurrently. #1878
querier.parallelise-shardable-queries(bool)- Requires a shard-compatible schema (v10+)
- This causes the number of traces to increase accordingly.
- The query-frontend now requires a schema config to determine how/when to shard queries, either from a file or from flags (i.e. by the
config-yamlCLI flag). This is the same schema config the queriers consume. The schema is only required to use this option. - It's also advised to increase downstream concurrency controls as well:
querier.max-outstanding-requests-per-tenantquerier.max-query-parallelismquerier.max-concurrentserver.grpc-max-concurrent-streams(for both query-frontends and queriers)
- [FEATURE] Added user sub rings to distribute users to a subset of ingesters. #1947
-experimental.distributor.user-subring-size
- [FEATURE] Add flag
-experimental.tsdb.stripe-sizeto expose TSDB stripe size option. #2185 - [FEATURE] Experimental Delete Series: Added support for Deleting Series with Prometheus style API. Needs to be enabled first by setting
-purger.enabletotrue. Deletion only supported when usingboltdbandfilesystemas index and object sto...
0.6.1 / 2020-02-05
This release includes a fix to support the WAL configuration via YAML config file (the 0.6.0 release supports WAL configuration only via CLI flags).
Changelog
- [BUGFIX] Fixed parsing of the WAL configuration when specified in the YAML config file. #2071
0.6.0 / 2020-01-28
Thanks to all contributors! ❤️
This release is made up of 109 contributions from 30 authors, and includes many features and improvements. Some highlights:
- Experimental Write-Ahead-Log (WAL) in ingesters for more data reliability against ingester crashes by @codesome
- On-the-fly migration path between two key-value (KV) stores using the
multistore by @pstibrany - Global ingestion rate limit by @pracucci
- Improvements to the experimental TSDB blocks storage by @thorfour @pstibrany @pracucci
Changelog
Note that the ruler flags need to be changed in this upgrade. You're moving from a single node ruler to something that might need to be sharded.
Further, if you're using the configs service, we've upgraded the migration library and this requires some manual intervention. See full instructions below to upgrade your PostgreSQL.
- [CHANGE] The frontend component now does not cache results if it finds a
Cache-Controlheader and if one of its values isno-store. #1974 - [CHANGE] Flags changed with transition to upstream Prometheus rules manager:
-ruler.client-timeoutis nowruler.configs.client-timeoutin order to matchruler.configs.url.-ruler.group-timeouthas been removed.-ruler.num-workershas been removed.-ruler.rule-pathhas been added to specify where the prometheus rule manager will sync rule files.-ruler.storage.typehas beem added to specify the rule store backend type, currently only the configdb.-ruler.poll-intervalhas been added to specify the interval in which to poll new rule groups.-ruler.evaluation-intervaldefault value has changed from15sto1mto match the default evaluation interval in Prometheus.- Ruler sharding requires a ring which can be configured via the ring flags prefixed by
ruler.ring.. #1987
- [CHANGE] Use relative links from /ring page to make it work when used behind reverse proxy. #1896
- [CHANGE] Deprecated
-distributor.limiter-reload-periodflag. #1766 - [CHANGE] Ingesters now write only normalised tokens to the ring, although they can still read denormalised tokens used by other ingesters.
-ingester.normalise-tokensis now deprecated, and ignored. If you want to switch back to using denormalised tokens, you need to downgrade to Cortex 0.4.0. Previous versions don't handle claiming tokens from normalised ingesters correctly. #1809 - [CHANGE] Overrides mechanism has been renamed to "runtime config", and is now separate from limits. Runtime config is simply a file that is reloaded by Cortex every couple of seconds. Limits and now also multi KV use this mechanism.
New arguments were introduced:-runtime-config.file(defaults to empty) and-runtime-config.reload-period(defaults to 10 seconds), which replace previously used-limits.per-user-override-configand-limits.per-user-override-periodoptions. Old options are still used if-runtime-config.fileis not specified. This change is also reflected in YAML configuration, where oldlimits.per_tenant_override_configandlimits.per_tenant_override_periodfields are replaced withruntime_config.fileandruntime_config.periodrespectively. #1749 - [CHANGE] Cortex now rejects data with duplicate labels. Previously, such data was accepted, with duplicate labels removed with only one value left. #1964
- [CHANGE] Changed the default value for
-distributor.ha-tracker.prefixfromcollectors/toha-tracker/in order to not clash with other keys (ie. ring) stored in the same key-value store. #1940 - [FEATURE] Experimental: Write-Ahead-Log added in ingesters for more data reliability against ingester crashes. #1103
--ingester.wal-enabled: Setting this totrueenables writing to WAL during ingestion.--ingester.wal-dir: Directory where the WAL data should be stored and/or recovered from.--ingester.checkpoint-enabled: Set this totrueto enable checkpointing of in-memory chunks to disk.--ingester.checkpoint-duration: This is the interval at which checkpoints should be created.--ingester.recover-from-wal: Set this totrueto recover data from an existing WAL.- For more information, please checkout the "Ingesters with WAL" guide.
- [FEATURE] The distributor can now drop labels from samples (similar to the removal of the replica label for HA ingestion) per user via the
distributor.drop-labelflag. #1726 - [FEATURE] Added flag
debug.mutex-profile-fractionto enable mutex profiling #1969 - [FEATURE] Added
globalingestion rate limiter strategy. Deprecated-distributor.limiter-reload-periodflag. #1766 - [FEATURE] Added support for Microsoft Azure blob storage to be used for storing chunk data. #1913
- [FEATURE] Added readiness probe endpoint
/readyto queriers. #1934 - [FEATURE] Added "multi" KV store that can interact with two other KV stores, primary one for all reads and writes, and secondary one, which only receives writes. Primary/secondary store can be modified in runtime via runtime-config mechanism (previously "overrides"). #1749
- [FEATURE] Added support to store ring tokens to a file and read it back on startup, instead of generating/fetching the tokens to/from the ring. This feature can be enabled with the flag
-ingester.tokens-file-path. #1750 - [FEATURE] Experimental TSDB: Added
/seriesAPI endpoint support with TSDB blocks storage. #1830 - [FEATURE] Experimental TSDB: Added TSDB blocks
compactorcomponent, which iterates over users blocks stored in the bucket and compact them according to the configured block ranges. #1942 - [ENHANCEMENT] metric
cortex_ingester_flush_reasonsgets a newreasonvalue:Spread, when-ingester.spread-flushesoption is enabled. #1978 - [ENHANCEMENT] Added
passwordandenable_tlsoptions to redis cache configuration. Enables usage of Microsoft Azure Cache for Redis service. #1923 - [ENHANCEMENT] Upgraded Kubernetes API version for deployments from
extensions/v1beta1toapps/v1. #1941 - [ENHANCEMENT] Experimental TSDB: Open existing TSDB on startup to prevent ingester from becoming ready before it can accept writes. The max concurrency is set via
--experimental.tsdb.max-tsdb-opening-concurrency-on-startup. #1917 - [ENHANCEMENT] Experimental TSDB: Querier now exports aggregate metrics from Thanos bucket store and in memory index cache (many metrics to list, but all have
cortex_querier_bucket_store_orcortex_querier_blocks_index_cache_prefix). #1996 - [ENHANCEMENT] Experimental TSDB: Improved multi-tenant bucket store. #1991
- Allowed to configure the blocks sync interval via
-experimental.tsdb.bucket-store.sync-interval(0 disables the sync) - Limited the number of tenants concurrently synched by
-experimental.tsdb.bucket-store.block-sync-concurrency - Renamed
cortex_querier_sync_secondsmetric tocortex_querier_blocks_sync_seconds - Track
cortex_querier_blocks_sync_secondsmetric for the initial sync too
- Allowed to configure the blocks sync interval via
- [BUGFIX] Fixed unnecessary CAS operations done by the HA tracker when the jitter is enabled. #1861
- [BUGFIX] Fixed ingesters getting stuck in a LEAVING state after coming up from an ungraceful exit. #1921
- [BUGFIX] Reduce memory usage when ingester Push() errors. #1922
- [BUGFIX] Table Manager: Fixed calculation of expected tables and creation of tables from next active schema considering grace period. #1976
- [BUGFIX] Experimental TSDB: Fixed ingesters consistency during hand-over when using experimental TSDB blocks storage. #1854 #1818
- [BUGFIX] Experimental TSDB: Fixed metrics when using experimental TSDB blocks storage. #1981 #1982 #1990 #1983
- [BUGFIX] Experimental memberlist: Use the advertised address when sending packets to other peers of the Gossip memberlist. #1857
Upgrading PostgreSQL (if you're using configs service)
Reference: https://github.com/golang-migrate/migrate/tree/master/database/postgres#upgrading-from-v1
- Install the migrate package cli tool: https://github.com/golang-migrate/migrate/tree/master/cmd/migrate#installation
- Drop the
schema_migrationstable:DROP TABLE schema_migrations;. - Run the migrate command:
migrate -path <absolute_path_to_cortex>/cmd/cortex/migrations -database postgres://localhost:5432/database force 2 Known issues
-
The
cortex_prometheus_rule_group_last_evaluation_timestamp_secondsmetric, tracked by the ruler, is not unregistered for rule groups not being used anymore. This issue will be fixed in the next Cortex release (see 2033). -
Write-Ahead-Log (WAL) does not have automatic repair of corrupt checkpoint or WAL segments, which is possible if ingester crashes abruptly or the underlying disk corrupts. Currently the only way to resolve this is to manually delete the affected checkpoint and/or WAL segments. Automatic repair will be added in the future releases.
0.4.0 / 2019-12-02
- [CHANGE] The frontend component has been refactored to be easier to re-use. When upgrading the frontend, cache entries will be discarded and re-created with the new protobuf schema. #1734
- [CHANGE] Removed direct DB/API access from the ruler.
-ruler.configs.urlhas been now deprecated. #1579 - [CHANGE] Removed
Deltaencoding. Any old chunks withDeltaencoding cannot be read anymore. Ifingester.chunk-encodingis set toDeltathe ingester will fail to start. #1706 - [CHANGE] Setting
-ingester.max-transfer-retriesto 0 now disables hand-over when ingester is shutting down. Previously, zero meant infinite number of attempts. #1771 - [CHANGE]
dynamohas been removed as a valid storage name to make it consistent for all components.awsandaws-dynamoremain as valid storage names. - [CHANGE/FEATURE] The frontend split and cache intervals can now be configured using the respective flag
--querier.split-queries-by-intervaland--frontend.cache-split-interval.- If
--querier.split-queries-by-intervalis not provided request splitting is disabled by default. --querier.split-queries-by-dayis still accepted for backward compatibility but has been deprecated. You should now use--querier.split-queries-by-interval. We recommend a to use a multiple of 24 hours.
- If
- [FEATURE] Global limit on the max series per user and metric #1760
-ingester.max-global-series-per-user-ingester.max-global-series-per-metric- Requires
-distributor.replication-factorand-distributor.shard-by-all-labelsset for the ingesters too
- [FEATURE] Flush chunks with stale markers early with
ingester.max-stale-chunk-idle. #1759 - [FEATURE] EXPERIMENTAL: Added new KV Store backend based on memberlist library. Components can gossip about tokens and ingester states, instead of using Consul or Etcd. #1721
- [FEATURE] EXPERIMENTAL: Use TSDB in the ingesters & flush blocks to S3/GCS ala Thanos. This will let us use an Object Store more efficiently and reduce costs. #1695
- [FEATURE] Allow Query Frontend to log slow queries with
frontend.log-queries-longer-than. #1744 - [FEATURE] Add HTTP handler to trigger ingester flush & shutdown - used when running as a stateful set with the WAL enabled. #1746
- [ENHANCEMENT] Reduce memory allocations in the write path. #1706
- [ENHANCEMENT] Consul client now follows recommended practices for blocking queries wrt returned Index value. #1708
- [ENHANCEMENT] Consul client can optionally rate-limit itself during Watch (used e.g. by ring watchers) and WatchPrefix (used by HA feature) operations. Rate limiting is disabled by default. New flags added:
--consul.watch-rate-limit, and--consul.watch-burst-size. #1708 - [ENHANCEMENT] Added jitter to HA deduping heartbeats, configure using
distributor.ha-tracker.update-timeout-jitter-max#1534 - [ENHANCEMENT] Add ability to flush chunks with stale markers early. #1759
- [BUGFIX] Stop reporting successful actions as 500 errors in KV store metrics. #1798
- [BUGFIX] Fix bug where duplicate labels can be returned through metadata APIs. #1790
- [BUGFIX] Fix reading of old, v3 chunk data. #1779
- [BUGFIX] Now support IAM roles in service accounts in AWS EKS. #1803
In this release we updated the following dependencies:
- gRPC v1.25.0 (resulted in a drop of 30% CPU usage when compression is on)
- jaeger-client v2.20.0
- aws-sdk-go to v1.25.22