You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: website/content/en/docs/reference/metrics.md
+351Lines changed: 351 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -8,6 +8,329 @@ description: >
8
8
---
9
9
<!-- this document is generated from hack/docs/metrics_gen/main.go -->
10
10
Karpenter makes several metrics available in Prometheus format to allow monitoring cluster provisioning status. These metrics are available by default at `karpenter.kube-system.svc.cluster.local:8080/metrics` configurable via the `METRICS_PORT` environment variable documented [here](../settings)
11
+
12
+
### `karpenter_ignored_pod_count`
13
+
Number of pods ignored during scheduling by Karpenter
14
+
- Stability Level: ALPHA
15
+
16
+
### `karpenter_build_info`
17
+
A metric with a constant '1' value labeled by version from which karpenter was built.
The current amount of time in seconds that a status condition has been in a specific state. Labeled by the name of the nodelcaim, namespace, type, status, and reason.
52
+
- Stability Level: BETA
53
+
54
+
### `operator_nodeclaim_status_condition_count`
55
+
The number of a condition for a nodeclaim, type and status. Labeled by the name, namespace, type, status, and reason.
The time taken between a node's deletion request and the removal of its finalizer
86
+
- Stability Level: BETA
87
+
88
+
### `karpenter_nodes_terminated_total`
89
+
Number of nodes terminated in total by Karpenter. Labeled by owning nodepool.
90
+
- Stability Level: STABLE
91
+
92
+
### `karpenter_nodes_system_overhead`
93
+
Node system daemon overhead are the resources reserved for system overhead, the difference between the node's capacity and allocatable values are reported by the status.
94
+
- Stability Level: BETA
95
+
96
+
### `karpenter_nodes_lifetime_duration_seconds`
97
+
The lifetime duration of the nodes since creation.
98
+
- Stability Level: ALPHA
99
+
100
+
### `karpenter_nodes_eviction_requests_total`
101
+
The total number of eviction requests made by Karpenter
102
+
- Stability Level: ALPHA
103
+
104
+
### `karpenter_nodes_drained_total`
105
+
The total number of nodes drained by Karpenter
106
+
- Stability Level: ALPHA
107
+
108
+
### `karpenter_nodes_current_lifetime_seconds`
109
+
Node age in seconds
110
+
- Stability Level: ALPHA
111
+
112
+
### `karpenter_nodes_created_total`
113
+
Number of nodes created in total by Karpenter. Labeled by owning nodepool.
114
+
- Stability Level: STABLE
115
+
116
+
### `karpenter_nodes_allocatable`
117
+
Node allocatable are the resources allocatable by nodes.
The current amount of time in seconds that a status condition has been in a specific state. Labeled by the name of the nodelcaim, namespace, type, status, and reason.
130
+
- Stability Level: BETA
131
+
132
+
### `operator_node_status_condition_count`
133
+
The number of a condition for a node, type and status. Labeled by the name, namespace, type, status, and reason.
The current amount of time in seconds that a node has been in terminating state. Labeled by name, and namespace.
138
+
- Stability Level: BETA
139
+
140
+
### `operator_node_termination_duration_seconds`
141
+
The amount of time taken by a node to terminate completely.
142
+
- Stability Level: BETA
143
+
144
+
### `operator_node_event_count`
145
+
The number of a events for a node.
146
+
- Stability Level: BETA
147
+
148
+
## Pods Metrics
149
+
150
+
### `karpenter_pods_state`
151
+
Pod state is the current state of pods. This metric can be used several ways as it is labeled by the pod name, namespace, owner, node, nodepool name, zone, architecture, capacity type, instance type and pod phase.
152
+
- Stability Level: BETA
153
+
154
+
### `karpenter_pods_startup_duration_seconds`
155
+
The time from pod creation until the pod is running.
156
+
- Stability Level: STABLE
157
+
158
+
## Termination Metrics
159
+
160
+
### `operator_termination_duration_seconds`
161
+
The amount of time taken by an object to terminate completely.
162
+
- Stability Level: DEPRECATED
163
+
164
+
### `operator_termination_current_time_seconds`
165
+
The current amount of time in seconds that an object has been in terminating state.
Duration of scheduling simulations used for deprovisioning and provisioning in seconds.
194
+
- Stability Level: STABLE
195
+
196
+
### `karpenter_scheduler_queue_depth`
197
+
The number of pods currently waiting to be scheduled.
198
+
- Stability Level: BETA
199
+
200
+
## Nodepools Metrics
201
+
202
+
### `karpenter_nodepools_usage`
203
+
The amount of resources that have been provisioned for a nodepool. Labeled by nodepool name and resource type.
204
+
- Stability Level: ALPHA
205
+
206
+
### `karpenter_nodepools_limit`
207
+
Limits specified on the nodepool that restrict the quantity of resources provisioned. Labeled by nodepool name and resource type.
208
+
- Stability Level: ALPHA
209
+
210
+
### `karpenter_nodepools_allowed_disruptions`
211
+
The number of nodes for a given NodePool that can be concurrently disrupting at a point in time. Labeled by NodePool. Note that allowed disruptions can change very rapidly, as new nodes may be created and others may be deleted at any point.
The current amount of time in seconds that a status condition has been in a specific state. Labeled by the name of the nodelcaim, namespace, type, status, and reason.
224
+
- Stability Level: BETA
225
+
226
+
### `operator_nodepool_status_condition_count`
227
+
The number of an condition for a nodepool, type and status. Labeled by the name, namespace, type, status, and reason.
The current amount of time in seconds that a status condition has been in a specific state. Labeled by the name of the nodelcaim, namespace, type, status, and reason.
Returns 1 if cluster state is synced and 0 otherwise. Synced checks that nodeclaims and nodes that are stored in the APIServer have the same representation as Karpenter's cluster state
Instance type offering estimated hourly price used when making informed decisions on node cost calculation, based on instance type, capacity type, and zone.
Copy file name to clipboardExpand all lines: website/content/en/docs/upgrading/upgrade-guide.md
+17Lines changed: 17 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -86,6 +86,23 @@ If you get the error `invalid ownership metadata; label validation error:` while
86
86
WHEN CREATING A NEW SECTION OF THE UPGRADE GUIDANCE FOR NEWER VERSIONS, ENSURE THAT YOU COPY THE BETA API ALERT SECTION FROM THE LAST RELEASE TO PROPERLY WARN USERS OF THE RISK OF UPGRADING WITHOUT GOING TO 0.32.x FIRST
87
87
-->
88
88
89
+
### Upgrading to `1.11.0`+
90
+
91
+
{{% alert title="Warning" color="warning" %}}
92
+
Karpenter `1.1.0` drops the support for `v1beta1` APIs.
93
+
**Do not** upgrade to `1.1.0`+ without following the [Migration Guide]({{<ref "../../v1.0/upgrading/v1-migration.md#before-upgrading-to-v110">}}).
94
+
{{% /alert %}}
95
+
96
+
* In the [getting started guide's cloudformation template]({{<ref "../../docs/reference/cloudformation/">}}),
97
+
there are new changes to IAM permissions in the Karpenter controller role for supporting placement groups:
98
+
-`ec2:DescribePlacementGroups` action in [AllowRegionalReadActions]({{<ref "../../docs/reference/cloudformation/#allowregionalreadactions">}})
99
+
-`arn:${AWS::Partition}:ec2:${AWS::Region}:*:placement-group/*` resource in [AllowScopedEC2InstanceAccessActions]({{<ref "../../docs/reference/cloudformation/#allowscopedec2instanceaccessactions">}})
100
+
If you are using placement groups, you will need to update your Karpenter controller role.
0 commit comments