Skip to content

Commit 2c18450

Browse files
authored
Merge pull request #6329 from CharlieTLe/proofread-gossip-ring-getting-started
2 parents 1d09628 + 5d3f5a9 commit 2c18450

18 files changed

+193
-180
lines changed

docs/guides/alert-manager-configuration.md

Lines changed: 8 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -7,30 +7,30 @@ slug: alertmanager-configuration
77

88
## Context
99

10-
Cortex Alertmanager notification setup follow mostly the syntax of Prometheus Alertmanager since it is based on the same codebase. The following is a description on how to load the configuration setup so that Alertmanager can use for notification when an alert event happened.
10+
Cortex Alertmanager notification setup follows mostly the syntax of Prometheus Alertmanager since it is based on the same codebase. The following is a description on how to load the configuration setup so that Alertmanager can use it for notification when an alert event happens.
1111

1212
### Configuring the Cortex Alertmanager storage backend
1313

14-
With the introduction of Cortex 1.8 the storage backend config option shifted to the new pattern [#3888](https://github.com/cortexproject/cortex/pull/3888). You can find the new configuration [here](../configuration/config-file-reference.md#alertmanager_storage_config)
14+
With the introduction of Cortex 1.8, the storage backend config option shifted to the new pattern [#3888](https://github.com/cortexproject/cortex/pull/3888). You can find the new configuration [here](../configuration/config-file-reference.md#alertmanager_storage_config)
1515

1616
Note that when using `-alertmanager.sharding-enabled=true`, the following storage backends are not supported: `local`, `configdb`.
1717

18-
When using the new configuration pattern it is important that any of the old configuration pattern flags are unset (`-alertmanager.storage`), as well as `-<prefix>.configs.url`. This is because the old pattern still takes precedence over the new one. The old configuration pattern (`-alertmanager.storage`) is marked as deprecated and will be removed by Cortex version 1.11. However this change doesn't apply to `-alertmanager.storage.path` and `-alertmanager.storage.retention`.
18+
When using the new configuration pattern, it is important that any of the old configuration pattern flags are unset (`-alertmanager.storage`), as well as `-<prefix>.configs.url`. This is because the old pattern still takes precedence over the new one. The old configuration pattern (`-alertmanager.storage`) is marked as deprecated and will be removed by Cortex version 1.11. However, this change doesn't apply to `-alertmanager.storage.path` and `-alertmanager.storage.retention`.
1919

2020
### Cortex Alertmanager configuration
2121

2222
Cortex Alertmanager can be uploaded via Cortex [Set Alertmanager configuration API](../api/_index.md#set-alertmanager-configuration) or using [Cortex Tools](https://github.com/cortexproject/cortex-tools).
2323

24-
Follow the instruction at the `cortextool` link above to download or update to the latest version of the tool.
24+
Follow the instructions at the `cortextool` link above to download or update to the latest version of the tool.
2525

2626
To obtain the full help of how to use `cortextool` for all commands and flags, use
2727
`cortextool --help-long`.
2828

2929
The following example shows the steps to upload the configuration to Cortex `Alertmanager` using `cortextool`.
3030

31-
#### 1. Create the Alertmanager configuration `yml` file.
31+
#### 1. Create the Alertmanager configuration YAML file.
3232

33-
The following is `amconfig.yml`, an example of a configuration for Cortex `Alertmanager` to send notification via email:
33+
The following is `amconfig.yml`, an example of a configuration for Cortex `Alertmanager` to send notifications via email:
3434

3535
```
3636
global:
@@ -50,7 +50,7 @@ receivers:
5050
- to: 'someone@localhost'
5151
```
5252

53-
[Example on how to setup Slack](https://grafana.com/blog/2020/02/25/step-by-step-guide-to-setting-up-prometheus-alertmanager-with-slack-pagerduty-and-gmail/#:~:text=To%20set%20up%20alerting%20in,to%20receive%20notifications%20from%20Alertmanager.) to support receiving Alertmanager notification.
53+
[Example on how to set up Slack](https://grafana.com/blog/2020/02/25/step-by-step-guide-to-setting-up-prometheus-alertmanager-with-slack-pagerduty-and-gmail/#:~:text=To%20set%20up%20alerting%20in,to%20receive%20notifications%20from%20Alertmanager.) to support receiving Alertmanager notifications.
5454

5555
#### 2. Upload the Alertmanager configuration
5656

@@ -76,3 +76,4 @@ cortextool alertmanager get \
7676
--id=100 \
7777
--key=<yourKey>
7878
```
79+

docs/guides/authentication-and-authorisation.md

Lines changed: 7 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -9,14 +9,14 @@ All Cortex components take the tenant ID from a header `X-Scope-OrgID`
99
on each request. A tenant (also called "user" or "org") is the owner of
1010
a set of series written to and queried from Cortex. All Cortex components
1111
trust this value completely: if you need to protect your Cortex installation
12-
from accidental or malicious calls then you must add an additional layer
12+
from accidental or malicious calls, then you must add an additional layer
1313
of protection.
1414

15-
Typically this means you run Cortex behind a reverse proxy, and you must
15+
Typically, this means you run Cortex behind a reverse proxy, and you must
1616
ensure that all callers, both machines sending data over the `remote_write`
1717
interface and humans sending queries from GUIs, supply credentials
1818
which identify them and confirm they are authorised. When configuring the
19-
`remote_write` API in Prometheus, the user and password fields of http Basic
19+
`remote_write` API in Prometheus, the user and password fields of HTTP Basic
2020
auth, or Bearer token, can be used to convey the tenant ID and/or credentials.
2121
See the [Cortex-Tenant](#cortex-tenant) section below for one way to solve this.
2222

@@ -34,7 +34,7 @@ To disable the multi-tenant functionality, you can pass the argument
3434
to the string `fake` for every request.
3535

3636
Note that the tenant ID that is used to write the series to the datastore
37-
should be the same as the one you use to query the data. If they don't match
37+
should be the same as the one you use to query the data. If they don't match,
3838
you won't see any data. As of now, you can't see series from other tenants.
3939

4040
For more information regarding the tenant ID limits, refer to: [Tenant ID limitations](./limitations.md#tenant-id-naming)
@@ -48,6 +48,7 @@ It can be placed between Prometheus and Cortex and will search for a predefined
4848
label and use its value as `X-Scope-OrgID` header when proxying the timeseries to Cortex.
4949

5050
This can help to run Cortex in a trusted environment where you want to separate your metrics
51-
into distinct namespaces by some criteria (e.g. teams, applications, etc).
51+
into distinct namespaces by some criteria (e.g. teams, applications, etc.).
52+
53+
Be advised that **cortex-tenant** is a third-party community project and it's not maintained by the Cortex team.
5254

53-
Be advised that **cortex-tenant** is a third-party community project and it's not maintained by Cortex team.

docs/guides/capacity-planning.md

Lines changed: 40 additions & 39 deletions
Original file line numberDiff line numberDiff line change
@@ -15,52 +15,53 @@ sent to Cortex.
1515

1616
Some key parameters are:
1717

18-
1. The number of active series. If you have Prometheus already you
19-
can query `prometheus_tsdb_head_series` to see this number.
20-
2. Sampling rate, e.g. a new sample for each series every minute
21-
(the default Prometheus [scrape_interval](https://prometheus.io/docs/prometheus/latest/configuration/configuration/)).
22-
Multiply this by the number of active series to get the
23-
total rate at which samples will arrive at Cortex.
24-
3. The rate at which series are added and removed. This can be very
25-
high if you monitor objects that come and go - for example if you run
26-
thousands of batch jobs lasting a minute or so and capture metrics
27-
with a unique ID for each one. [Read how to analyse this on
28-
Prometheus](https://www.robustperception.io/using-tsdb-analyze-to-investigate-churn-and-cardinality).
29-
4. How compressible the time-series data are. If a metric stays at
30-
the same value constantly, then Cortex can compress it very well, so
31-
12 hours of data sampled every 15 seconds would be around 2KB. On
32-
the other hand if the value jumps around a lot it might take 10KB.
33-
There are not currently any tools available to analyse this.
34-
5. How long you want to retain data for, e.g. 1 month or 2 years.
18+
1. The number of active series. If you have Prometheus already, you
19+
can query `prometheus_tsdb_head_series` to see this number.
20+
2. Sampling rate, e.g. a new sample for each series every minute
21+
(the default Prometheus [scrape_interval](https://prometheus.io/docs/prometheus/latest/configuration/configuration/)).
22+
Multiply this by the number of active series to get the
23+
total rate at which samples will arrive at Cortex.
24+
3. The rate at which series are added and removed. This can be very
25+
high if you monitor objects that come and go - for example, if you run
26+
thousands of batch jobs lasting a minute or so and capture metrics
27+
with a unique ID for each one. [Read how to analyse this on
28+
Prometheus](https://www.robustperception.io/using-tsdb-analyze-to-investigate-churn-and-cardinality).
29+
4. How compressible the time-series data are. If a metric stays at
30+
the same value constantly, then Cortex can compress it very well, so
31+
12 hours of data sampled every 15 seconds would be around 2KB. On
32+
the other hand, if the value jumps around a lot, it might take 10KB.
33+
There are not currently any tools available to analyse this.
34+
5. How long you want to retain data for, e.g. 1 month or 2 years.
3535

3636
Other parameters which can become important if you have particularly
3737
high values:
3838

39-
6. Number of different series under one metric name.
40-
7. Number of labels per series.
41-
8. Rate and complexity of queries.
39+
6. Number of different series under one metric name.
40+
7. Number of labels per series.
41+
8. Rate and complexity of queries.
4242

4343
Now, some rules of thumb:
4444

45-
1. Each million series in an ingester takes 15GB of RAM. Total number
46-
of series in ingesters is number of active series times the
47-
replication factor. This is with the default of 12-hour chunks - RAM
48-
required will reduce if you set `-ingester.max-chunk-age` lower
49-
(trading off more back-end database IO).
50-
There are some additional considerations for planning for ingester memory usage.
51-
1. Memory increases during write ahead log (WAL) replay, [See Prometheus issue #6934](https://github.com/prometheus/prometheus/issues/6934#issuecomment-726039115). If you do not have enough memory for WAL replay, the ingester will not be able to restart successfully without intervention.
52-
2. Memory temporarily increases during resharding since timeseries are temporarily on both the new and old ingesters. This means you should scale up the number of ingesters before memory utilization is too high, otherwise you will not have the headroom to account for the temporary increase.
53-
2. Each million series (including churn) consumes 15GB of chunk
54-
storage and 4GB of index, per day (so multiply by the retention
55-
period).
56-
3. The distributors CPU utilization depends on the specific Cortex cluster
57-
setup, while they don't need much RAM. Typically, distributors are capable
58-
to process between 20,000 and 100,000 samples/sec with 1 CPU core. It's also
59-
highly recommended to configure Prometheus `max_samples_per_send` to 1,000
60-
samples, in order to reduce the distributors CPU utilization given the same
61-
total samples/sec throughput.
45+
1. Each million series in an ingester takes 15GB of RAM. The total number
46+
of series in ingesters is the number of active series times the
47+
replication factor. This is with the default of 12-hour chunks - RAM
48+
required will reduce if you set `-ingester.max-chunk-age` lower
49+
(trading off more back-end database I/O).
50+
There are some additional considerations for planning for ingester memory usage.
51+
1. Memory increases during write-ahead log (WAL) replay, [See Prometheus issue #6934](https://github.com/prometheus/prometheus/issues/6934#issuecomment-726039115). If you do not have enough memory for WAL replay, the ingester will not be able to restart successfully without intervention.
52+
2. Memory temporarily increases during resharding since timeseries are temporarily on both the new and old ingesters. This means you should scale up the number of ingesters before memory utilization is too high, otherwise you will not have the headroom to account for the temporary increase.
53+
2. Each million series (including churn) consumes 15GB of chunk
54+
storage and 4GB of index, per day (so multiply by the retention
55+
period).
56+
3. The distributors CPU utilization depends on the specific Cortex cluster
57+
setup, while they don't need much RAM. Typically, distributors are capable
58+
of processing between 20,000 and 100,000 samples/sec with 1 CPU core. It's also
59+
highly recommended to configure Prometheus `max_samples_per_send` to 1,000
60+
samples, in order to reduce the distributors CPU utilization given the same
61+
total samples/sec throughput.
6262

6363
If you turn on compression between distributors and ingesters (for
64-
example to save on inter-zone bandwidth charges at AWS/GCP) they will use
65-
significantly more CPU (approx 100% more for distributor and 50% more
64+
example, to save on inter-zone bandwidth charges at AWS/GCP), they will use
65+
significantly more CPU (approx. 100% more for distributor and 50% more
6666
for ingester).
67+

docs/guides/encryption-at-rest.md

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -48,8 +48,8 @@ The alertmanager S3 server-side encryption can be configured similarly to the bl
4848

4949
### Per-tenant config overrides
5050

51-
The S3 client used by the blocks storage, ruler and alertmanager supports S3 SSE config overrides on a per-tenant basis, using the [runtime configuration file](../configuration/arguments.md#runtime-configuration-file).
52-
The following settings can ben overridden for each tenant:
51+
The S3 client used by the blocks storage, ruler, and alertmanager supports S3 SSE config overrides on a per-tenant basis, using the [runtime configuration file](../configuration/arguments.md#runtime-configuration-file).
52+
The following settings can be overridden for each tenant:
5353

5454
- **`s3_sse_type`**<br />
5555
S3 server-side encryption type. It must be set to enable the SSE config override for a given tenant.
@@ -60,4 +60,5 @@ The following settings can ben overridden for each tenant:
6060

6161
## Other storages
6262

63-
Other storage backends may support encryption at rest configuring it directly at the storage level.
63+
Other storage backends may support encryption at rest, configuring it directly at the storage level.
64+

docs/guides/encryption-at-rest.template

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -30,8 +30,8 @@ The alertmanager S3 server-side encryption can be configured similarly to the bl
3030

3131
### Per-tenant config overrides
3232

33-
The S3 client used by the blocks storage, ruler and alertmanager supports S3 SSE config overrides on a per-tenant basis, using the [runtime configuration file](../configuration/arguments.md#runtime-configuration-file).
34-
The following settings can ben overridden for each tenant:
33+
The S3 client used by the blocks storage, ruler, and alertmanager supports S3 SSE config overrides on a per-tenant basis, using the [runtime configuration file](../configuration/arguments.md#runtime-configuration-file).
34+
The following settings can be overridden for each tenant:
3535

3636
- **`s3_sse_type`**<br />
3737
S3 server-side encryption type. It must be set to enable the SSE config override for a given tenant.
@@ -42,4 +42,5 @@ The following settings can ben overridden for each tenant:
4242

4343
## Other storages
4444

45-
Other storage backends may support encryption at rest configuring it directly at the storage level.
45+
Other storage backends may support encryption at rest, configuring it directly at the storage level.
46+

docs/guides/glossary.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ A single chunk contains timestamp-value pairs for several series.
2121

2222
Churn is the frequency at which series become idle.
2323

24-
A series become idle once it's not exported anymore by the monitored targets. Typically, series become idle when the monitored target itself disappear (eg. the process or node gets terminated).
24+
A series becomes idle once it's not exported anymore by the monitored targets. Typically, series become idle when the monitored target itself disappears (eg. the process or node gets terminated).
2525

2626
### Flushing
2727

@@ -35,7 +35,7 @@ For more information, please refer to the guide "[Config for sending HA Pairs da
3535

3636
### Hash ring
3737

38-
The hash ring is a distributed data structure used by Cortex for sharding, replication and service discovery. The hash ring data structure gets shared across Cortex replicas via gossip or a key-value store.
38+
The hash ring is a distributed data structure used by Cortex for sharding, replication, and service discovery. The hash ring data structure gets shared across Cortex replicas via gossip or a key-value store.
3939

4040
For more information, please refer to the [Architecture](../architecture.md#the-hash-ring) documentation.
4141

@@ -94,6 +94,6 @@ _See [Tenant](#tenant)._
9494

9595
### WAL
9696

97-
The Write-Ahead Log (WAL) is an append only log stored on disk used by ingesters to recover their in-memory state after the process gets restarted, either after a clear shutdown or an abruptly termination.
97+
The Write-Ahead Log (WAL) is an append-only log stored on disk used by ingesters to recover their in-memory state after the process gets restarted, either after a clear shutdown or an abrupt termination.
9898

9999
For more information, please refer to [Ingesters with WAL](../blocks-storage/_index.md#the-write-path).

0 commit comments

Comments
 (0)