You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/integrations/microsoft-azure/kubernetes.md
+25-25Lines changed: 25 additions & 25 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -34,12 +34,12 @@ The following are the minimum supported requirements for this application:
34
34
35
35
The AKS - Control Plane app collects logs for the following [Azure Kubernetes Services](https://azure.microsoft.com/en-us/services/kubernetes-service/):
36
36
37
-
***kube-audit** - Contains all Kubernetes API Server audit logs including events with the get and list verbs. These events are useful for monitoring all of the interactions with the Kubernetes API.***kube-audit** - Contains all Kubernetes API Server audit logs including events with the get and list verbs. These events are useful for monitoring all of the interactions with the Kubernetes API.
38
-
***kube-audit-admin** - Contains Kubernetes API Server audit logs excluding events with the get and list verbs. These events are useful for monitoring resource modification requests made to the Kubernetes API.
39
-
***kube-apiserver** - The API server exposes the underlying Kubernetes APIs. This component provides the interaction for management tools, such as kubectl or the Kubernetes dashboard.
40
-
***kube-scheduler** - The Scheduler determines what nodes can run the workload when you create or scale applications and then starts them.
41
-
***kube-controller-manager** - The Controller Manager oversees a number of smaller controllers that perform actions, such as replicating pods and handling node operations.
42
-
***cluster-autoscaler** - To keep up with application demands in AKS, you might need to adjust the number of nodes that run your workloads. The cluster autoscaler component watches for pods in your cluster that can't be scheduled because of resource constraints. When the cluster autoscaler detects issues, it scales up the number of nodes in the node pool to meet the application demands. It also regularly checks nodes for a lack of running pods and scales down the number of nodes as needed.
37
+
***kube-audit**. Contains all Kubernetes API Server audit logs including the events with get and list verbs. These events are useful for monitoring all of the interactions with the Kubernetes API.
38
+
***kube-audit-admin**. Contains Kubernetes API Server audit logs excluding events with the get and list verbs. These events are useful for monitoring the resource modification requests made to the Kubernetes API.
39
+
***kube-apiserver**. The API server exposes the underlying Kubernetes APIs. This component provides the interaction for management tools, such as kubectl or the Kubernetes dashboard.
40
+
***kube-scheduler**. The Scheduler determines what nodes can run the workload when you create or scale applications and then starts them.
41
+
***kube-controller-manager**. The Controller Manager oversees a number of smaller controllers that perform actions, such as replicating pods and handling node operations.
42
+
***cluster-autoscaler**. The cluster autoscaler component watches for pods in your cluster that can't be scheduled because of resource constraints. When the cluster autoscaler detects issues, it scales up the number of nodes in the node pool to meet the application demands. It also regularly checks nodes for a lack of running pods and scales down the number of nodes as needed.
43
43
44
44
45
45
### Sample log messages
@@ -209,9 +209,9 @@ Sumo Logic Metrics source is currently in Beta, to participate, contact your Sum
209
209
In the Sumo Logic Azure Metrics source configuration,
210
210
211
211
- Tag the location field in the source with correct Azure resource location value. <br/><img src={useBaseUrl('img/integrations/microsoft-azure/Azure-Storage-Tag-Location.png')} alt="Azure Container Instance Tag Location" style={{border: '1px solid gray'}} width="400" />
### Collecting logs collection for the Azure Kubernetes Cluster
214
+
### Collecting logs for the Azure Kubernetes Cluster
215
215
216
216
This section walks you through the process of configuring a pipeline to send logs from Azure Monitor to Sumo Logic.
217
217
@@ -248,67 +248,67 @@ import AppInstall from '../../reuse/apps/app-install.md';
248
248
249
249
### Overview
250
250
251
-
The **Azure Kubernetes Service - Overview** dashboard provides insights like Audit Requests by Location, Active/Total Clusters, Clusters with API Server Errors, Clusters with Autoscaler Errors, Clusters with Kube Controller Manager Errors, Clusters with Scheduler Errors, Clusters with Cloud Control Manager Errors, Nodes Across Cluster and Critical Nodes Across Cluster.
251
+
The **Azure Kubernetes Service - Overview** dashboard provides insights like Audit Requests by Location, Active/Total Clusters, Clusters with API Server Errors, Clusters with Autoscaler Errors, Clusters with Kube Controller Manager Errors, Clusters with Scheduler Errors, Clusters with Cloud Control Manager Errors, Nodes Across Cluster, and Critical Nodes Across Cluster.
252
252
253
253
<img src={useBaseUrl('https://sumologic-app-data-v2.s3.us-east-1.amazonaws.com/dashboards/AzureKubernetesService/Azure-Kubernetes-Service-Overview.png')} alt="Azure Kubernetes Service - Overview" />
254
254
255
255
### Administrative Operations
256
256
257
-
The **Azure Kubernetes Service - Administrative Operations** dashboard provides details like Top 10 Operations That Caused The Most Errors, Distribution by Operation Type (Read, Write and Delete), Distribution by Operations, Recent Write Operations, Recent Delete Operations, Users / Applications by Operation type, Distribution by Status.
257
+
The **Azure Kubernetes Service - Administrative Operations** dashboard provides details like Top 10 Operations That Caused The Most Errors, Distribution by Operation Type (Read, Write, and Delete), Distribution by Operations, Recent Write Operations, Recent Delete Operations, Users / Applications by Operation type, and Distribution by Status.
258
258
259
259
<img src={useBaseUrl('https://sumologic-app-data-v2.s3.us-east-1.amazonaws.com/dashboards/AzureKubernetesService/Azure-Kubernetes-Service-Administrative-Operations.png')} alt="Azure Kubernetes Service - Administrative Operations" />
260
260
261
261
### Audit
262
262
263
-
The **Azure Kubernetes Service - Audit** dashboard provides details about Requests by Location, Failure by Operations, Failure by Stages, Failure by Reason, Distribution by Status Code, Top 10 Failed Resources, Successful Resource Details, Top 10 Users, Failure Trend by User, Failure Details.
263
+
The **Azure Kubernetes Service - Audit** dashboard provides details about the Requests by Location, Failure by Operations, Failure by Stages, Failure by Reason, Distribution by Status Code, Top 10 Failed Resources, Successful Resource Details, Top 10 Users, Failure Trend by User, and Failure Details.
264
264
265
265
<img src={useBaseUrl('https://sumologic-app-data-v2.s3.us-east-1.amazonaws.com/dashboards/AzureKubernetesService/Azure-Kubernetes-Service-Audit.png')} alt="Azure Kubernetes Service - Audit" />
266
266
267
267
### Audit Admin
268
268
269
-
The **Azure Kubernetes Service - Audit Admin** dashboard details about Requests by Location, Failure by Operations, Failure by Stages, Failure by Reason, Distribution by Status Code, Top 10 Failed Resources, Successful Resource Details, Top 10 Users, Failure Trend by User, Failure Details.
269
+
The **Azure Kubernetes Service - Audit Admin** dashboard details about the Requests by Location, Failure by Operations, Failure by Stages, Failure by Reason, Distribution by Status Code, Top 10 Failed Resources, Successful Resource Details, Top 10 Users, Failure Trend by User, and Failure Details.
270
270
271
271
<img src={useBaseUrl('https://sumologic-app-data-v2.s3.us-east-1.amazonaws.com/dashboards/AzureKubernetesService/Azure-Kubernetes-Service-Audit-Admin.png')} alt="Azure Kubernetes Service - Audit Admin" />
272
272
273
273
### API Server
274
274
275
-
The **Azure Kubernetes Service - API Server** dashboard provides insights about Failed Urls, Total Requests by Url, Failed Methods, Total Requests by Method, Requests by Severity, Errors by Severity and Error Log Events.
275
+
The **Azure Kubernetes Service - API Server** dashboard provides insights about the Failed Urls, Total Requests by Url, Failed Methods, Total Requests by Method, Requests by Severity, Errors by Severity, and Error Log Events.
276
276
277
277
<img src={useBaseUrl('https://sumologic-app-data-v2.s3.us-east-1.amazonaws.com/dashboards/AzureKubernetesService/Azure-Kubernetes-Service-API-Server.png')} alt="Azure Kubernetes Service - API Server" />
278
278
279
279
### Cloud Control Manager
280
280
281
-
The **Azure Kubernetes Service - Cloud Control Manager** dashboard provides insights about Severity Breakdown, Severity Over Time, Error Message Count, Error Log Stream.
281
+
The **Azure Kubernetes Service - Cloud Control Manager** dashboard provides insights about the Severity Breakdown, Severity Over Time, Error Message Count, and Error Log Stream.
282
282
283
283
<img src={useBaseUrl('https://sumologic-app-data-v2.s3.us-east-1.amazonaws.com/dashboards/AzureKubernetesService/Azure-Kubernetes-Service-Cloud-Control-Manager.png')} alt="Azure Kubernetes Service - Cloud Control Manager" />
284
284
285
285
### Cluster Autoscaler
286
286
287
-
The **Azure Kubernetes Service - Cluster Autoscaler** dashboard provides insights about Severity Breakdown, Severity Over Time, Error Message Count, Error Log Stream.
287
+
The **Azure Kubernetes Service - Cluster Autoscaler** dashboard provides insights about the Severity Breakdown, Severity Over Time, Error Message Count, and Error Log Stream.
288
288
289
289
<img src={useBaseUrl('https://sumologic-app-data-v2.s3.us-east-1.amazonaws.com/dashboards/AzureKubernetesService/Azure-Kubernetes-Service-Cluster-Autoscaler.png')} alt="Azure Kubernetes Service - Cluster Autoscaler" />
290
290
291
291
### Controller Manager
292
292
293
-
The **Azure Kubernetes Service - Controller Manager** dashboard provides insights about Severity Breakdown, Severity Over Time, Error Message Count, Error Log Stream.
293
+
The **Azure Kubernetes Service - Controller Manager** dashboard provides insights about the Severity Breakdown, Severity Over Time, Error Message Count, and Error Log Stream.
294
294
295
295
<img src={useBaseUrl('https://sumologic-app-data-v2.s3.us-east-1.amazonaws.com/dashboards/AzureKubernetesService/Azure-Kubernetes-Service-Controller-Manager.png')} alt="Azure Kubernetes Service - Controller Manager" />
296
296
297
297
### Policy and Recommendations
298
298
299
-
The **Azure Kubernetes Service - Policy and Recommendations** dashboard provides details like Total Recommendation Events, Total Success Policy Events, Total Failed Policy Events, Failed Policy Events, Recent Recommendation Events, Recommendation, Policy etc.
299
+
The **Azure Kubernetes Service - Policy and Recommendations** dashboard provides details like Total Recommendation Events, Total Success Policy Events, Total Failed Policy Events, Failed Policy Events, Recent Recommendation Events, Recommendation, and Policy.
300
300
301
301
<img src={useBaseUrl('https://sumologic-app-data-v2.s3.us-east-1.amazonaws.com/dashboards/AzureKubernetesService/Azure-Kubernetes-Service-Policy-and-Recommendations.png')} alt="Azure Kubernetes Service - Policy and Recommendations" />
302
302
303
303
### Scheduler
304
304
305
-
The **Azure Kubernetes Service - Scheduler** dashboard provides details about Severity Over Time, Severity Breakdown and Error Messages.
305
+
The **Azure Kubernetes Service - Scheduler** dashboard provides details about the Severity Over Time, Severity Breakdown, and Error Messages.
306
306
307
307
<img src={useBaseUrl('https://sumologic-app-data-v2.s3.us-east-1.amazonaws.com/dashboards/AzureKubernetesService/Azure-Kubernetes-Service-Scheduler.png')} alt="Azure Kubernetes Service - Scheduler" />
308
308
309
309
### Apiserver
310
310
311
-
The **Azure Kubernetes Service - Apiserver** dashboard provides insights about Average API Server CPU Usage (%), Average API Server Memory Usage (%), Average Inflight Requests Count, API Server CPU Usage (%), API Server Memory Usage (%) and Average Inflight Requests.
311
+
The **Azure Kubernetes Service - Apiserver** dashboard provides insights about the Average API Server CPU Usage (%), Average API Server Memory Usage (%), Average Inflight Requests Count, API Server CPU Usage (%), API Server Memory Usage (%), and Average Inflight Requests.
312
312
313
313
<img src={useBaseUrl('https://sumologic-app-data-v2.s3.us-east-1.amazonaws.com/dashboards/AzureKubernetesService/Azure-Kubernetes-Service-Apiserver.png')} alt="Azure Kubernetes Service - Apiserver" />
314
314
@@ -339,12 +339,12 @@ The **Azure Kubernetes Service - Node Memory** dashboard provides insights about
339
339
### Azure Key Vaults alerts
340
340
These alerts are metric based and will work for all Key Vaults.
341
341
342
-
| Alert Name | Alert Description and Conditions | Alert Condition| Recover Condition |
|`Azure Kubernetes Service - High CPU Usage`| This alert is triggered with critical type when cpu usage percentage when cpu usage percentage greater than 95% and triggered with warning type when greater than 85%. | percentage < 95 | percentage >= 95 |
345
-
|`Azure Kubernetes Service - Unreachable Kube Node(s)`| This alert is triggered when kube node(s) unreachable count greater than 1. | Count >= 1 | Count < 1|
346
-
|`Azure Kubernetes Service - High Memory Working Set`| This alert triggers when memory working set percentage greater than 100. | percentage >= 100 | percentage < 100 |
347
-
|`Azure Kubernetes Service - High Node Disk Usage`| This alert is triggered with critical type when node disk usage % is greater than 80% and trigger with type warning when greater than 70%. | percentage >= 80 | percentage < 80 |
342
+
| Alert Name | Alert Description and Conditions | Alert Condition | Recover Condition |
343
+
|:--|:--|:--|:--|
344
+
|`Azure Kubernetes Service - High CPU Usage`| This alert is triggered when CPU usage percentage is greater than 95%. Also, a warning type alert will be triggered when CPU usage percentage is greater than 85%. | percentage >= 95 | percentage < 95 |
345
+
|`Azure Kubernetes Service - Unreachable Kube Node(s)`| This alert is triggered when kube node(s) unreachable count greater than 1. | Count >= 1 | Count < 1 |
346
+
|`Azure Kubernetes Service - High Memory Working Set`| This alert is triggered when memory working set is greater than 100%.| percentage >= 100 | percentage < 100 |
347
+
|`Azure Kubernetes Service - High Node Disk Usage`| This alert is triggered when node disk usage is greater than 80% . Also, a warning alert will be triggered when node disk usage is greater than 70%.| percentage >= 80 | percentage < 80 |
0 commit comments