You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* Restructure TOC, merge content
* Initial rewrite of yntax content
* Second pass at cleaning up syntax explanation
* Large refactor for consistent info style and flow
* Clean up H1 and title metadata
* Simplify the get started heading and title
* Better title for syntax
* Fix typo in metadata
* Restore old aliases
* Tweaks to the aliases
* Add clarification from PR #4692
* Update text to force commit
* fix a broken link
* Fix some of the passive voice issues
* More tweaks for passive vs active voice
* One more passive voice fix
* Update docs/sources/get-started/components/community-components.md
Co-authored-by: Copilot <[email protected]>
* Update docs/sources/get-started/modules.md
Co-authored-by: Copilot <[email protected]>
* Update docs/sources/get-started/clustering.md
Co-authored-by: Copilot <[email protected]>
* Update docs/sources/get-started/components/community-components.md
Co-authored-by: Copilot <[email protected]>
* Update docs/sources/get-started/components/custom-components.md
Co-authored-by: Copilot <[email protected]>
* Update docs/sources/get-started/expressions/operators.md
Co-authored-by: Copilot <[email protected]>
* Update docs/sources/get-started/_index.md
Co-authored-by: Copilot <[email protected]>
* Update docs/sources/get-started/components/community-components.md
Co-authored-by: Copilot <[email protected]>
* Remove redundant information, and fix duplicate heading
* Apply suggestions from code review
Co-authored-by: Copilot <[email protected]>
* Update docs/sources/get-started/components/configure-components.md
Co-authored-by: Copilot <[email protected]>
* Refactor to make more engaging and reduce duplication
* More restructing to improve info flow
* Fix the topic weight
* Update topic order and improve info flow
* Clean up informaiton flow
* More style and content updates
* More updates for info flow and consistency
* Reshuffling info to improve info flow
* Fix some Vale linting errors
* More concept flow cleanup
* Clean up a few broken links, fix some examples, and add missing text
* Add more context and info to teh landing page
* Add loki.source.file syntax tests
* Apply suggestion from @ptodev
Co-authored-by: Paulin Todev <[email protected]>
---------
Co-authored-by: Copilot <[email protected]>
Co-authored-by: Paulin Todev <[email protected]>
# Get started with {{% param "FULL_PRODUCT_NAME" %}}
11
16
12
-
This section helps you get started with {{< param "FULL_PRODUCT_NAME" >}}.
17
+
{{< param "FULL_PRODUCT_NAME" >}} uses a configuration language to define how components collect, transform, and send data.
18
+
Components are building blocks that perform specific tasks, such as reading files, collecting metrics, or sending data to external systems.
13
19
14
-
{{< section >}}
20
+
To write effective configurations, you need to understand three fundamental elements: blocks, attributes, and expressions.
21
+
Mastering these building blocks lets you create powerful data collection and processing pipelines.
22
+
23
+
## Basic configuration elements
24
+
25
+
All {{< param "PRODUCT_NAME" >}} configurations use three main elements: blocks, attributes, and expressions.
26
+
27
+
## Blocks
28
+
29
+
Blocks group related settings and configure different parts of {{< param "PRODUCT_NAME" >}}.
30
+
Each block has a name and contains attributes or nested blocks.
31
+
32
+
```alloy
33
+
prometheus.remote_write "production" {
34
+
endpoint {
35
+
url = "http://localhost:9009/api/prom/push"
36
+
}
37
+
}
38
+
```
39
+
40
+
This example contains two blocks:
41
+
42
+
-`prometheus.remote_write "production"`: Creates a component with the label `"production"`
43
+
-`endpoint`: A nested block that configures connection settings
44
+
45
+
## Attributes
46
+
47
+
Attributes set individual values within blocks.
48
+
They follow the format `ATTRIBUTE_NAME = ATTRIBUTE_VALUE`.
49
+
50
+
```alloy
51
+
log_level = "debug"
52
+
timeout = 30
53
+
enabled = true
54
+
```
55
+
56
+
## Expressions
57
+
58
+
Expressions compute values for attributes.
59
+
You can use simple constants or more complex calculations.
60
+
61
+
**Constants:**
62
+
63
+
```alloy
64
+
name = "my-service"
65
+
port = 9090
66
+
tags = ["web", "api"]
67
+
```
68
+
69
+
**Simple calculations:**
70
+
71
+
You can use arithmetic operations to compute values from other variables.
72
+
This lets you build dynamic configurations where values depend on other settings.
73
+
74
+
```alloy
75
+
total_timeout = base_timeout + retry_timeout
76
+
```
77
+
78
+
**Function calls:**
79
+
80
+
Function calls let you access system information and transform data.
81
+
[Built-in][] functions like `sys.env()` retrieve environment variables, while others can manipulate strings, decode JSON, and perform other operations.
82
+
83
+
```alloy
84
+
home_dir = sys.env("HOME")
85
+
config_path = home_dir + "/config.yaml"
86
+
```
87
+
88
+
**Component references:**
89
+
90
+
Component references let you use data from other parts of your configuration.
91
+
To reference a component's data, combine three parts with periods:
92
+
93
+
- Component name: `local.file`
94
+
- Label: `secret`
95
+
- Export name: `content`
96
+
- Result: `local.file.secret.content`
97
+
98
+
```alloy
99
+
password = local.file.secret.content
100
+
```
101
+
102
+
You'll learn about more powerful expressions in the dedicated [Expressions][] section, including how to reference data from other components and use more built-in functions.
103
+
You can find the available exports for each component in the [Components][components] documentation.
104
+
105
+
## Configuration syntax
106
+
107
+
{{< param "PRODUCT_NAME" >}} uses a declarative configuration language, which means you describe what you want your system to do rather than how to do it.
108
+
This design makes configurations flexible and easy to understand.
109
+
110
+
You can organize blocks and attributes in any order that makes sense for your use case.
111
+
{{< param "PRODUCT_NAME" >}} automatically determines the dependencies between components and evaluates them in the correct order.
112
+
113
+
## Configuration files
114
+
115
+
{{< param "PRODUCT_NAME" >}} configuration files conventionally use a `.alloy` file extension, though you can name single files anything you want.
116
+
If you specify a directory path, {{< param "PRODUCT_NAME" >}} processes only files with the `.alloy` extension.
117
+
You must save your configuration files as UTF-8 encoded text - {{< param "PRODUCT_NAME" >}} can't parse files with invalid UTF-8 encoding.
118
+
119
+
## Tooling
120
+
121
+
You can use these tools to write {{< param "PRODUCT_NAME" >}} configuration files:
description: Learn about Grafana Alloy clustering concepts
6
-
menuTitle: Clustering
7
6
title: Clustering
8
-
weight: 500
7
+
weight: 70
9
8
---
10
9
11
10
# Clustering
12
11
13
-
Clustering allows a fleet of {{< param "PRODUCT_NAME" >}} deployments to work together for workload distribution and high availability.
12
+
You learned about components, expressions, syntax, and modules in the previous sections.
13
+
Now you'll learn about clustering, which allows multiple {{< param "PRODUCT_NAME" >}} deployments to work together for distributed data collection.
14
+
15
+
Clustering provides workload distribution and high availability.
14
16
It enables horizontally scalable deployments with minimal resource and operational overhead.
15
17
16
-
{{< param "PRODUCT_NAME" >}} uses an eventually consistent model to achieve clustering.
17
-
This model assumes all participating {{< param "PRODUCT_NAME" >}} deployments are interchangeable and converge on the same configuration file.
18
+
{{< param "PRODUCT_NAME" >}} uses an eventually consistent model with a gossip protocol to achieve clustering.
19
+
This model assumes all participating {{< param "PRODUCT_NAME" >}} deployments are interchangeable and use identical configurations.
20
+
The cluster uses a consistent hashing algorithm to distribute work among nodes.
18
21
19
22
A standalone, non-clustered {{< param "PRODUCT_NAME" >}} behaves the same as a single-node cluster.
20
23
21
-
You configure clustering by passing `cluster` command-line flags to the [run][] command.
24
+
You configure clustering by passing `--cluster.*` command-line flags to the [`alloy run`][run] command.
25
+
Cluster-enabled components must explicitly enable clustering through a `clustering` block in their configuration.
22
26
23
27
## Use cases
24
28
29
+
Clustering serves several purposes in {{< param "PRODUCT_NAME" >}} deployments, with the primary focus being workload distribution and scalability.
30
+
25
31
### Target auto-distribution
26
32
27
-
Target auto-distribution is the simplest use case of clustering.
33
+
Target auto-distribution is the most common use case of clustering.
28
34
It lets scraping components running on all peers distribute the scrape load among themselves.
29
-
Target auto-distribution requires all {{< param "PRODUCT_NAME" >}} deployments in the same cluster to access the same service discovery APIs and scrape the same targets.
35
+
36
+
For target auto-distribution to work:
37
+
38
+
1. All {{< param "PRODUCT_NAME" >}} deployments in the same cluster must access the same service discovery APIs.
39
+
1. All deployments must scrape the same targets.
30
40
31
41
You must explicitly enable target auto-distribution on components by defining a `clustering` block.
42
+
This integrates with the component system you learned about in previous sections:
A cluster detects state changes when a node joins or leaves.
44
-
All participating components locally recalculate target ownership and re-balance the number of targets they're scraping without explicitly communicating ownership over the network.
62
+
When a cluster detects state changes (when a node joins or leaves), all participating components locally recalculate target ownership using a consistent hashing algorithm.
63
+
Components re-balance the targets they're scraping without explicitly communicating ownership over the network.
64
+
Each node uses 512 tokens in the hash ring for optimal load distribution.
45
65
46
66
Target auto-distribution lets you dynamically scale the number of {{< param "PRODUCT_NAME" >}} deployments to handle workload peaks.
47
-
It also provides resiliency because one of the node peers automatically picks up targets if a node leaves.
67
+
It also provides resiliency because remaining node peers automatically pick up targets if a node leaves the cluster.
48
68
49
69
{{< param "PRODUCT_NAME" >}} uses a local consistent hashing algorithm to distribute targets.
50
-
On average, only ~1/N of the targets are redistributed.
70
+
When the cluster size changes, this algorithm redistributes only approximately 1/N of the targets, minimizing disruption.
51
71
52
72
Refer to the component reference documentation to check if a component supports clustering, such as:
53
73
@@ -58,31 +78,62 @@ Refer to the component reference documentation to check if a component supports
58
78
59
79
## Best practices
60
80
81
+
Follow these guidelines to ensure effective clustering in your {{< param "PRODUCT_NAME" >}} deployments.
82
+
61
83
### Avoid issues with disproportionately large targets
62
84
63
-
When your environment has a mix of very large and average-sized targets, avoid running too many cluster instances. While clustering generally does a good job of sharding targets to achieve balanced workload distribution, significant target size disparity can lead to uneven load distribution. When you have a few disproportionately large targets among many instances, the nodes assigned these large targets will experience much higher load compared to others (e.g. samples/second in case of Prometheus metrics), potentially causing uneven load balancing or hitting resource limitations. In these scenarios, it's often better to scale vertically rather than horizontally to reduce the impact of outlier large targets. This approach ensures more consistent resource utilization across your deployment and prevents overloading specific instances.
85
+
When your environment has a mix of very large and average-sized targets, avoid running too many cluster instances.
86
+
While clustering generally does a good job of sharding targets to achieve balanced workload distribution, significant target size disparity can lead to uneven load distribution.
87
+
When you have a few disproportionately large targets among many instances, the nodes assigned these large targets experience much higher load compared to others, for example samples per second for Prometheus metrics, potentially causing uneven load balancing or hitting resource limitations.
88
+
In these scenarios, it's often better to scale vertically rather than horizontally to reduce the impact of outlier large targets.
89
+
This approach ensures more consistent resource utilization across your deployment and prevents overloading specific instances.
64
90
65
91
### Use `--cluster.wait-for-size`, but with caution
66
92
67
-
When using clustering in a deployment where a single instance cannot handle the entire load, it's recommended to use the `--cluster.wait-for-size` flag to ensure a minimum cluster size before accepting traffic. However, leave a significant safety margin when configuring this value by setting it significantly smaller than your typical expected operational number of instances. When this condition is not met, the instances will completely stop processing traffic in cluster-enabled components so it's important to leave room for any unexpected events.
93
+
When you use clustering in a deployment where a single instance can't handle the entire load, use the `--cluster.wait-for-size` flag to ensure a minimum cluster size before accepting traffic.
94
+
However, leave a significant safety margin when you configure this value by setting it significantly smaller than your typical expected operational number of instances.
95
+
When this condition isn't met, the instances stop processing traffic in cluster-enabled components, so it's important to leave room for any unexpected events.
68
96
69
-
For example, if you're using Horizontal Pod Autoscalers (HPA) or PodDisruptionBudgets (PDB) in Kubernetes, ensure that the `--cluster.wait-for-size` flag is set to a value well below what your HPA and PDB minimums allow. This prevents traffic from stopping when Kubernetes instance counts temporarily drop below these thresholds during normal operations like pod termination or rolling updates.
97
+
For example, if you're using Horizontal Pod Autoscalers (HPA) or PodDisruptionBudgets (PDB) in Kubernetes, set the `--cluster.wait-for-size` flag to a value well below what your HPA and PDB minimums allow.
98
+
This prevents traffic from stopping when Kubernetes instance counts temporarily drop below these thresholds during normal operations like Pod termination or rolling updates.
70
99
71
-
We recommend to use the `--cluster.wait-timeout` flag to set a reasonable timeout for the waiting period to limit the impact of potential misconfiguration. The appropriate timeout duration should be based on how quickly you expect your orchestration or incident response team to provision required number of instances. Be aware that when timeout passes the cluster may be too small to handle traffic and run into further issues.
100
+
It's recommended to use the `--cluster.wait-timeout` flag to set a reasonable timeout for the waiting period to limit the impact of potential misconfiguration.
101
+
You can base the timeout duration on how quickly you expect your orchestration or incident response team to provision the required number of instances.
102
+
Be aware that when the timeout passes, the cluster may be too small to handle traffic and can run into further issues.
72
103
73
-
### Do not enable clustering when you don't need it
104
+
### Don't enable clustering if you don't need it
74
105
75
-
While clustering scales to very large numbers of instances, it introduces additional overhead in the form of logs, metrics, potential alerts, and processing requirements. If you're not using components that specifically support and benefit from clustering, it's best to not enable clustering at all. A particularly common mistake is enabling clustering on logs collecting DaemonSets. Collecting logs from mounted node's pod logs does not benefit from having clustering enabled since each instance typically collects logs only from its own node. In such cases, enabling clustering only adds unnecessary complexity and resource usage without providing functional benefits.
106
+
While clustering scales to very large numbers of instances, it introduces additional overhead in the form of logs, metrics, potential alerts, and processing requirements.
107
+
If you're not using components that specifically support and benefit from clustering, it's best to not enable clustering at all.
108
+
A particularly common mistake is enabling clustering on logs collecting DaemonSets.
109
+
Collecting logs from Pods on the mounted node doesn't benefit from having clustering enabled since each instance typically collects logs only from Pods on its own node.
110
+
In such cases, enabling clustering only adds unnecessary complexity and resource usage without providing functional benefits.
76
111
77
112
## Cluster monitoring and troubleshooting
78
113
79
114
You can monitor your cluster status using the {{< param "PRODUCT_NAME" >}} UI [clustering page][].
80
115
Refer to [Debug clustering issues][debugging] for additional troubleshooting information.
81
116
117
+
## Next steps
118
+
119
+
Now that you understand how clustering works with {{< param "PRODUCT_NAME" >}} components, explore these topics:
120
+
121
+
-[Deploy {{< param "PRODUCT_NAME" >}}][deploy] - Set up clustered deployments in production environments.
122
+
-[Monitor {{< param "PRODUCT_NAME" >}}][monitor] - Learn about monitoring cluster health and performance.
123
+
-[Troubleshooting][debugging] - Debug clustering issues and interpret cluster metrics.
124
+
125
+
For detailed configuration:
126
+
127
+
-[`alloy run` command reference][run] - Configure clustering using command-line flags.
128
+
-[Component reference][components] - Explore clustering-enabled components like `prometheus.scrape` and `pyroscope.scrape`.
0 commit comments