Skip to content

Commit 5c599e6

Browse files
clayton-cornellptodevdehaansa
authored
Update the Debug topic to add detailed information about logs (#4245)
* Add detailed information about logs * Vale cleanup * Small tweaks * Updates from review suggestions * Apply suggestion from @ptodev Co-authored-by: Paulin Todev <[email protected]> * Apply suggestion from @clayton-cornell * Apply suggestion from @clayton-cornell * Apply suggestions from code review Co-authored-by: Clayton Cornell <[email protected]> --------- Co-authored-by: Paulin Todev <[email protected]> Co-authored-by: Sam DeHaan <[email protected]>
1 parent 27cd3fa commit 5c599e6

File tree

2 files changed

+132
-16
lines changed

2 files changed

+132
-16
lines changed

docs/sources/reference/config-blocks/logging.md

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -65,6 +65,31 @@ You can view the logs through Event Viewer.
6565

6666
In other cases, redirect `stderr` of the {{< param "PRODUCT_NAME" >}} process to a file for logs to persist on disk.
6767

68+
## Retrieve logs
69+
70+
You can retrieve the logs in different ways depending on your platform and installation method:
71+
72+
**Linux:**
73+
74+
* If you're running {{< param "PRODUCT_NAME" >}} with systemd, use `journalctl -u alloy`.
75+
* If you're running {{< param "PRODUCT_NAME" >}} in a Docker container, use `docker logs CONTAINER_ID`.
76+
77+
**macOS:**
78+
79+
* If you're running {{< param "PRODUCT_NAME" >}} with Homebrew as a service, use `brew services info grafana/grafana/alloy` to check status and `tail -f $(brew --prefix)/var/log/alloy.log` for logs.
80+
* If you're running {{< param "PRODUCT_NAME" >}} with launchd, use `log show --predicate 'process == "alloy"' --info` or check `/usr/local/var/log/alloy.log`.
81+
* If you're running {{< param "PRODUCT_NAME" >}} in a Docker container, use `docker logs CONTAINER_ID`.
82+
83+
**Windows:**
84+
85+
* If you're running {{< param "PRODUCT_NAME" >}} as a Windows service, check the Windows Event Viewer under **Windows Logs** > **Application** for Alloy-related events.
86+
* If you're running {{< param "PRODUCT_NAME" >}} that is manually installed, check the log files in `%PROGRAMDATA%\Grafana\Alloy\logs\` or the directory specified in your configuration.
87+
* If you're running {{< param "PRODUCT_NAME" >}} in a Docker container, use `docker logs CONTAINER_ID`.
88+
89+
**All platforms:**
90+
91+
* {{< param "PRODUCT_NAME" >}} writes logs to `stderr` if started directly without a service manager.
92+
6893
## Example
6994

7095
```alloy

docs/sources/troubleshoot/debug.md

Lines changed: 107 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
canonical: https://grafana.com/docs/alloy/latest/troubleshoot/debug/
33
aliases:
44
- ../tasks/debug/ # /docs/alloy/latest/tasks/debug/
5-
description: Learn about debugging issues with Grafana alloy
5+
description: Learn about debugging issues with Grafana Alloy
66
title: Debug Grafana Alloy
77
menuTitle: Debug
88
weight: 1000
@@ -25,10 +25,10 @@ This default prevents other machines on the network from being able to view the
2525

2626
To expose the UI to other machines on the network on non-containerized platforms, refer to the documentation for how you [installed][install] {{< param "PRODUCT_NAME" >}}.
2727

28-
If you are running a custom installation of {{< param "PRODUCT_NAME" >}}, refer to the documentation for the [`alloy run` command][alloy run] to learn how to change the HTTP listen address, > and pass the appropriate flag when running {{< param "PRODUCT_NAME" >}}.
28+
If you are running a custom installation of {{< param "PRODUCT_NAME" >}}, refer to the documentation for the [`run`][run] command to learn how to change the HTTP listen address and pass the appropriate flag when running {{< param "PRODUCT_NAME" >}}.
2929

3030
[install]: ../../set-up/install/
31-
[alloy run]: ../../reference/cli/run/
31+
[run]: ../../reference/cli/run/
3232
{{< /admonition >}}
3333

3434
### Home page
@@ -56,6 +56,7 @@ To access the graph page of a module, click on the **Graph** button on the modul
5656

5757
The amount of data that exits a component that supports [live debugging](#live-debugging-page) is shown on the outgoing edges of the component.
5858
The data is refreshed according to the `window` parameter.
59+
5960
### Component detail page
6061

6162
{{< figure src="/media/docs/alloy/ui_component_detail_page_2.png" alt="Alloy UI component detail page" >}}
@@ -70,7 +71,7 @@ The component detail page shows the following information for each component:
7071
From there you can also go to the component documentation or to its corresponding [Live Debugging page](#live-debugging-page).
7172

7273
{{< admonition type="note" >}}
73-
Values marked as a [secret][] are obfuscated and display as the text `(secret)`.
74+
Values marked as a [secret][] display only as the text `(secret)`.
7475

7576
[secret]: ../../get-started/configuration-syntax/expressions/types_and_values/#secrets
7677
{{< /admonition >}}
@@ -83,7 +84,7 @@ The clustering page shows the following information for each cluster node:
8384

8485
* The node's name.
8586
* The node's advertised address.
86-
* The node's current state (Viewer/Participant/Terminating).
87+
* The node's current state: Viewer, Participant, or Terminating.
8788
* The local node that serves the UI.
8889

8990
### Live Debugging page
@@ -93,7 +94,7 @@ The clustering page shows the following information for each cluster node:
9394
Live debugging provides a real-time stream of debugging data from a component. You can access this page from the corresponding [Component detail page](#component-detail-page).
9495

9596
{{< admonition type="caution" >}}
96-
Live debugging is disabled by default to avoid accidentally displaying sensitive telemetry data. To enable live debugging, configure the [livedebugging block][livedebugging].
97+
Live debugging defaults to disabled to avoid accidentally displaying sensitive telemetry data. To enable live debugging, configure the [`livedebugging`][livedebugging] block.
9798

9899
[livedebugging]: ../../reference/config-blocks/livedebugging/
99100
{{< /admonition >}}
@@ -128,19 +129,110 @@ Supported components:
128129

129130
To debug using the UI:
130131

131-
* Ensure that no component is reported as unhealthy.
132+
* Ensure that no component reports as unhealthy.
132133
* Ensure that the arguments and exports for misbehaving components appear correct.
133134
* Ensure that the live debugging data meets your expectations.
134135

135136
## Examine logs
136137

137-
Logs may also help debug issues with {{< param "PRODUCT_NAME" >}}.
138+
{{< param "PRODUCT_NAME" >}} provides different log levels that help you determine the root cause of issues.
139+
You can configure the log level using the [`logging`][logging] configuration block in your {{< param "PRODUCT_NAME" >}} configuration file.
140+
141+
Logs from {{< param "PRODUCT_NAME" >}} are written in `logfmt` format by default.
142+
You can configure the [log format][] to be either `logfmt` or `json`.
143+
You can [retrieve][] the logs in different ways depending on your platform and installation method.
144+
145+
[logging]: ../../reference/config-blocks/logging/
146+
[log format]: ../../reference/config-blocks/logging/#log-format
147+
[retrieve]: ../../reference/config-blocks/logging/#retrieve-logs
148+
149+
### Common log messages
150+
151+
The following log messages are normal during {{< param "PRODUCT_NAME" >}} operation:
152+
153+
#### Component lifecycle messages
154+
155+
During normal startup and operation, you'll see messages about component lifecycle:
156+
157+
**Component startup and initialization:**
158+
159+
```text
160+
level=info msg="starting controller"
161+
level=info msg="starting server"
162+
level=info msg="starting server" addr=localhost:8080
163+
level=info msg="started scheduled components"
164+
```
165+
166+
{{< admonition type="note" >}}
167+
The `starting server` messages refer to the built-in [HTTP server][http] that hosts the debugging UI, `/metrics` endpoint, and other debugging endpoints.
168+
169+
[http]: ../../reference/config-blocks/http/
170+
{{< /admonition >}}
171+
172+
**Component updates and configuration changes:**
173+
174+
```text
175+
level=info msg="configuration loaded"
176+
level=info msg="module content loaded"
177+
level=info msg="started scheduled components"
178+
level=info msg="terminating server"
179+
```
138180

139-
To reduce logging noise, many components hide debugging info behind debug-level log lines.
140-
It's recommended that you configure the [`logging` block][logging] to show debug-level log lines when debugging issues with {{< param "PRODUCT_NAME" >}}.
181+
**Component health reporting:**
141182

142-
The location of {{< param "PRODUCT_NAME" >}} logs is different based on how it's deployed.
143-
Refer to the [`logging` block][logging] page to see how to find logs for your system.
183+
```text
184+
level=info msg="started scheduled components"
185+
level=warn msg="failed to start scheduled component" err="connection refused"
186+
level=warn msg="the discovery.process component only works on linux; enabling it otherwise will do nothing"
187+
```
188+
189+
#### Cluster operation messages
190+
191+
If you enable clustering, you'll see messages about cluster operations:
192+
193+
**Normal startup and peer discovery:**
194+
195+
```text
196+
level=info msg="starting cluster node" peers_count=2 peers=192.168.1.10:12345,192.168.1.11:12345 advertise_addr=192.168.1.12:12345
197+
level=info msg="using provided peers for discovery" join_peers="192.168.1.10:12345, 192.168.1.11:12345"
198+
level=info msg="discovered peers" peers_count=3 peers=192.168.1.10:12345,192.168.1.11:12345,192.168.1.12:12345
199+
level=info msg="rejoining peers" peers_count=2 peers=192.168.1.10:12345,192.168.1.11:12345
200+
```
201+
202+
**Cluster size management:**
203+
204+
```text
205+
level=info msg="minimum cluster size reached, marking cluster as ready to admit traffic" minimum_cluster_size=3 peers_count=3
206+
level=warn msg="minimum cluster size requirements are not met - marking cluster as not ready for traffic" minimum_cluster_size=3 minimum_size_wait_timeout=5m0s peers_count=2
207+
level=warn msg="deadline passed, marking cluster as ready to admit traffic" minimum_cluster_size=3 minimum_size_wait_timeout=5m0s peers_count=2
208+
```
209+
210+
**Normal cluster operations:**
211+
212+
```text
213+
level=debug msg="found an IP cluster join address" addr=192.168.1.10:12345
214+
level=debug msg="received DNS query response" addr=cluster.example.com record_type=A records_count=3
215+
```
216+
217+
#### Expected warnings
218+
219+
Some warnings are normal during startup or cluster changes:
220+
221+
```text
222+
level=warn msg="failed to get peers to join at startup; will create a new cluster" err="no peers available"
223+
level=warn msg="failed to connect to peers; bootstrapping a new cluster" err="connection refused"
224+
level=warn msg="failed to resolve provided join address" addr=unavailable-node:12345
225+
```
226+
227+
#### Problematic messages
228+
229+
These messages indicate issues that require attention:
230+
231+
```text
232+
level=error msg="failed to bootstrap a fresh cluster with no peers" err="bind: address already in use"
233+
level=error msg="failed to rejoin list of peers" err="connection timeout"
234+
level=warn msg="failed to refresh list of peers" err="dns resolution failed"
235+
```
144236

145237
## Debug clustering issues
146238

@@ -153,16 +245,15 @@ To debug issues when using [clustering][], check for the following symptoms.
153245
Again, check for network connectivity issues.
154246
Check that the addresses or DNS names given to the node to join are correctly formatted and reachable.
155247
* **Configuration drift**: Clustering assumes that all nodes are running with the same configuration file at roughly the same time.
156-
Check the logs for issues with the reloaded configuration file as well as the graph page to verify changes have been applied.
248+
Check the logs for issues with the reloaded configuration file as well as the graph page to verify that the changes are applied.
157249
* **Node name conflicts**: Clustering assumes all nodes have unique names.
158-
Nodes with conflicting names are rejected and won't join the cluster.
250+
The cluster rejects nodes with conflicting names and they won't join the cluster.
159251
Look at the clustering UI page for the list of current peers with their names, and check the logs for any reported name conflict events.
160252
* **Node stuck in terminating state**: The node attempted to gracefully shut down and set its state to Terminating, but it hasn't completely gone away.
161253
Check the clustering page to view the state of the peers and verify that the terminating {{< param "PRODUCT_NAME" >}} has been shut down.
162254

163255
{{< admonition type="note" >}}
164-
Some issues that appear to be clustering issues may be symptoms of other issues, for example, problems with scraping or service discovery can result in missing metrics for an Alloy instance that can be interpreted as a node not joining the cluster.
256+
Some issues that appear to be clustering issues may be symptoms of other issues, for example, problems with scraping or service discovery can result in missing metrics for an {{< param "PRODUCT_NAME" >}} instance that you can interpret as a node not joining the cluster.
165257
{{< /admonition >}}
166258

167-
[logging]: ../../reference/config-blocks/logging/
168259
[clustering]: ../../get-started/clustering/

0 commit comments

Comments
 (0)