Skip to content

Commit 83be56b

Browse files
authored
Merge pull request #1150 from buildkite/docs/add-buildkite-agent-metrics-cli-section
Add buildkite-agent-metrics CLI section to monitoring and observability docs
2 parents 9bf7887 + 457170d commit 83be56b

File tree

6 files changed

+114
-10
lines changed

6 files changed

+114
-10
lines changed

app/models/page/renderers/external_link.rb

Lines changed: 3 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@ def initialize(node)
2424
def external_link?
2525

2626
def has_internal_link_prefix?
27-
INTERNAL_LINK_PREFIXES.any? { |prefix| @href.include?(prefix) }
27+
INTERNAL_LINK_PREFIXES.any? { |prefix| @href.start_with?(prefix) }
2828
end
2929

3030
def buildkite_domain?
@@ -41,10 +41,8 @@ def buildkite_domain?
4141
end
4242

4343
def decorate_external_link_node
44-
unless node['class']
45-
node.set_attribute('class', 'external-link')
46-
node.set_attribute('target', '_blank')
47-
end
44+
node.set_attribute('class', 'external-link') unless node['class']
45+
node.set_attribute('target', '_blank')
4846
end
4947

5048
def process

pages/agent/self_hosted/monitoring_and_observability.md

Lines changed: 106 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -124,6 +124,112 @@ Once enabled, the agent will generate the following metrics (duration measured i
124124
- `buildkite.jobs.duration.success.median`
125125
- `buildkite.jobs.duration.success.95percentile`
126126

127+
## Buildkite agent metrics CLI
128+
129+
The [buildkite-agent-metrics](https://github.com/buildkite/buildkite-agent-metrics) tool is a standalone command-line binary that collects agent and job metrics from the [`metrics` endpoint of the Buildkite agent API](/docs/apis/agent-api/metrics) and publishes these metrics to a monitoring and observability backend of your choice. This tool is particularly useful for enabling autoscaling based on queue depth and agent availability.
130+
131+
The tool supports the following backends:
132+
133+
- [AWS CloudWatch](https://aws.amazon.com/cloudwatch/) (default)
134+
- [StatsD](https://github.com/etsy/statsd) (including Datadog-compatible tagging)
135+
- [Prometheus](https://prometheus.io)
136+
- [Google Cloud Monitoring](https://cloud.google.com/monitoring)
137+
- [New Relic](https://newrelic.com/products/insights)
138+
- [OpenTelemetry](https://opentelemetry.io)
139+
140+
### Installing
141+
142+
Download the latest binary from [GitHub Releases](https://github.com/buildkite/buildkite-agent-metrics/releases), or run it as a Docker container:
143+
144+
```shell
145+
docker run --rm public.ecr.aws/buildkite/agent-metrics:latest \
146+
-token "$BUILDKITE_AGENT_TOKEN" \
147+
-interval 30s \
148+
-queue my-queue
149+
```
150+
151+
You can also install from source using Go:
152+
153+
```shell
154+
go install github.com/buildkite/buildkite-agent-metrics/v5@latest
155+
```
156+
157+
### Running
158+
159+
The tool requires an [agent token](/docs/agent/self-hosted/tokens), which could be the same one used when [assigning the self-hosted agent to a queue](/docs/agent/queues#assigning-a-self-hosted-agent-to-a-queue), or another agent token configured within the same [cluster](/docs/pipelines/security/clusters). The simplest deployment runs it as a long-running daemon that collects metrics across all queues in an organization:
160+
161+
```shell
162+
buildkite-agent-metrics -token "$BUILDKITE_AGENT_TOKEN" -interval 30s
163+
```
164+
165+
To restrict collection to specific queues, use the `-queue` flag (repeatable):
166+
167+
```shell
168+
buildkite-agent-metrics -token "$BUILDKITE_AGENT_TOKEN" -interval 30s -queue my-queue
169+
```
170+
171+
To select a backend, use the `-backend` flag:
172+
173+
```shell
174+
buildkite-agent-metrics -token "$BUILDKITE_AGENT_TOKEN" -interval 30s -backend statsd
175+
```
176+
177+
### Collected metrics
178+
179+
The tool collects the following metrics per organization and per queue:
180+
181+
<table class="responsive-table">
182+
<thead>
183+
<tr>
184+
<th style="width:35%">Metric</th>
185+
<th>Description</th>
186+
</tr>
187+
</thead>
188+
<tbody>
189+
<% [
190+
{
191+
metric: "`ScheduledJobsCount`",
192+
description: "Jobs waiting in the queue for an available agent. This should be close to zero if you have sufficient agent capacity."
193+
},
194+
{
195+
metric: "`RunningJobsCount`",
196+
description: "Jobs currently being executed by agents."
197+
},
198+
{
199+
metric: "`WaitingJobsCount`",
200+
description: "Jobs that can't be scheduled yet due to dependencies or `wait` steps. Useful for autoscaling, as these represent work that starts soon."
201+
},
202+
{
203+
metric: "`UnfinishedJobsCount`",
204+
description: "All jobs that have been scheduled but haven't finished. Includes both running and scheduled jobs."
205+
},
206+
{
207+
metric: "`IdleAgentsCount`",
208+
description: "Agents connected but not running a job."
209+
},
210+
{
211+
metric: "`BusyAgentsCount`",
212+
description: "Agents currently running a job."
213+
},
214+
{
215+
metric: "`TotalAgentsCount`",
216+
description: "Total number of connected agents."
217+
},
218+
{
219+
metric: "`BusyAgentPercentage`",
220+
description: "Percentage of agents currently busy."
221+
}
222+
].each do |row| %>
223+
<tr>
224+
<td><%= render_markdown(text: row[:metric]) %></td>
225+
<td><%= render_markdown(text: row[:description]) %></td>
226+
</tr>
227+
<% end %>
228+
</tbody>
229+
</table>
230+
231+
For more details on configuration options, AWS Lambda deployment, and backend-specific settings, see the [buildkite-agent-metrics README](https://github.com/buildkite/buildkite-agent-metrics?tab=readme-ov-file#buildkite-agent-metrics).
232+
127233
## Tracing
128234

129235
For Datadog APM or OpenTelemetry tracing, see [Tracing in the Buildkite agent](/docs/agent/self-hosted/monitoring-and-observability/tracing).

pages/apis/agent_api.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ The agent REST API is used to retrieve agent metrics, register agents, de-regist
44

55
The agent REST API's _publicly_ available endpoints include:
66

7-
- [`/metrics`](/docs/apis/agent-api/metrics): Used to retrieve information about current self-hosted agents associated with a Buildkite cluster. The [Buildkite Agent Metrics](https://github.com/buildkite/buildkite-agent-metrics) CLI tool uses the data returned by the metrics endpoint for agent autoscaling.
7+
- [`/metrics`](/docs/apis/agent-api/metrics): Used to retrieve information about current self-hosted agents associated with a Buildkite cluster. The [buildkite-agent-metrics](/docs/agent/self-hosted/monitoring-and-observability#buildkite-agent-metrics-cli) CLI tool uses the data returned by the metrics endpoint for agent autoscaling.
88
- [`/stacks`](/docs/apis/agent-api/stacks): Used to implement a _stack_ on a self-hosted queue. A stack is a long-running controller process that watches the queue for jobs, and runs Buildkite agents on demand to run these jobs.
99

1010
All other endpoints in the agent API are intended only for use by the Buildkite agent, therefore stability and backwards compatibility are not guaranteed, and changes won't be announced.

pages/pipelines/best_practices/agent_management.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -96,7 +96,7 @@ Learn more about using clusters and queues in [Managing clusters](/docs/pipeline
9696

9797
## Right-sizing of your agent fleet
9898

99-
- Monitor queue times with [cluster insights](/docs/pipelines/security/clusters#cluster-insights) and [Buildkite agent Metrics](https://github.com/buildkite/buildkite-agent-metrics).
99+
- Monitor queue times with [cluster insights](/docs/pipelines/security/clusters#cluster-insights) and the [buildkite-agent-metrics](/docs/agent/self-hosted/monitoring-and-observability#buildkite-agent-metrics-cli) tool.
100100
- Use cloud-based autoscaling ([Elastic CI Stack for AWS](https://github.com/buildkite/elastic-ci-stack-for-aws), [Buildkite agent Scaler](https://github.com/buildkite/buildkite-agent-scaler), [Agent Stack for Kubernetes](/docs/agent/self-hosted/agent-stack-k8s)).
101101
- Maintain dedicated pools for CPU-intensive, GPU-enabled, or OS-specific workloads.
102102
- Configure [graceful termination](/docs/agent/lifecycle#signal-handling) to allow jobs to complete.

pages/pipelines/best_practices/parallel_builds.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -147,7 +147,7 @@ In addition to the [Elastic CI Stack for AWS](/docs/agent/self-hosted/aws/elasti
147147
- [Pipelines REST API](/docs/apis/rest-api/pipelines) and [Agents API](/docs/apis/rest-api/agents) you're able to fetch each pipeline's job count, and information about each agent.
148148
- [Agent priorities](/docs/agent/self-hosted/prioritization) allow you to define which agents are assigned work first, such as high performance ephemeral agents.
149149
- [Agent queues](/docs/agent/queues) allow you to divide your agent pools into separate groups for scaling and performance purposes.
150-
- [buildkite-agent-metrics](https://github.com/buildkite/buildkite-agent-metrics) tool allow you to collect your organization's Buildkite metrics and report them to AWS CloudWatch and StatsD.
150+
- [buildkite-agent-metrics](/docs/agent/self-hosted/monitoring-and-observability#buildkite-agent-metrics-cli) tool allows you to collect your organization's Buildkite metrics and report them to a range of backends including AWS CloudWatch, StatsD, Prometheus, and OpenTelemetry.
151151

152152
Using these tools you can automate your build infrastructure, scale your agents based on demand, and massively reduce build times using job parallelism.
153153

spec/models/page/renderer_spec.rb

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -169,11 +169,11 @@
169169

170170
it "does not affect links with existing css classes" do
171171
md = <<~MD
172-
<p><a href="https://www.github.com/buildkite/docs" class="Docs__example-repo">Docs repo</a></p>
172+
<p><a href="https://www.github.com/buildkite/docs" class="Docs__example-repo" target="_blank">Docs repo</a></p>
173173
MD
174174

175175
html = <<~HTML
176-
<p><a href="https://www.github.com/buildkite/docs" class="Docs__example-repo">Docs repo</a></p>
176+
<p><a href="https://www.github.com/buildkite/docs" class="Docs__example-repo" target="_blank">Docs repo</a></p>
177177
HTML
178178

179179
expect(Page::Renderer.render(md).strip).to eql(html.strip)

0 commit comments

Comments
 (0)