Skip to content

Commit d700c4c

Browse files
monitor section updated
2 parents a2564f1 + 44ee583 commit d700c4c

File tree

1 file changed

+25
-25
lines changed

1 file changed

+25
-25
lines changed

docs/integrations/microsoft-azure/kubernetes.md

Lines changed: 25 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -34,12 +34,12 @@ The following are the minimum supported requirements for this application:
3434

3535
The AKS - Control Plane app collects logs for the following [Azure Kubernetes Services](https://azure.microsoft.com/en-us/services/kubernetes-service/):
3636

37-
* **kube-audit** - Contains all Kubernetes API Server audit logs including events with the get and list verbs. These events are useful for monitoring all of the interactions with the Kubernetes API.* **kube-audit** - Contains all Kubernetes API Server audit logs including events with the get and list verbs. These events are useful for monitoring all of the interactions with the Kubernetes API.
38-
* **kube-audit-admin** - Contains Kubernetes API Server audit logs excluding events with the get and list verbs. These events are useful for monitoring resource modification requests made to the Kubernetes API.
39-
* **kube-apiserver** - The API server exposes the underlying Kubernetes APIs. This component provides the interaction for management tools, such as kubectl or the Kubernetes dashboard.
40-
* **kube-scheduler** - The Scheduler determines what nodes can run the workload when you create or scale applications and then starts them.
41-
* **kube-controller-manager** - The Controller Manager oversees a number of smaller controllers that perform actions, such as replicating pods and handling node operations.
42-
* **cluster-autoscaler** - To keep up with application demands in AKS, you might need to adjust the number of nodes that run your workloads. The cluster autoscaler component watches for pods in your cluster that can't be scheduled because of resource constraints. When the cluster autoscaler detects issues, it scales up the number of nodes in the node pool to meet the application demands. It also regularly checks nodes for a lack of running pods and scales down the number of nodes as needed.
37+
* **kube-audit**. Contains all Kubernetes API Server audit logs including the events with get and list verbs. These events are useful for monitoring all of the interactions with the Kubernetes API.
38+
* **kube-audit-admin**. Contains Kubernetes API Server audit logs excluding events with the get and list verbs. These events are useful for monitoring the resource modification requests made to the Kubernetes API.
39+
* **kube-apiserver**. The API server exposes the underlying Kubernetes APIs. This component provides the interaction for management tools, such as kubectl or the Kubernetes dashboard.
40+
* **kube-scheduler**. The Scheduler determines what nodes can run the workload when you create or scale applications and then starts them.
41+
* **kube-controller-manager**. The Controller Manager oversees a number of smaller controllers that perform actions, such as replicating pods and handling node operations.
42+
* **cluster-autoscaler**. The cluster autoscaler component watches for pods in your cluster that can't be scheduled because of resource constraints. When the cluster autoscaler detects issues, it scales up the number of nodes in the node pool to meet the application demands. It also regularly checks nodes for a lack of running pods and scales down the number of nodes as needed.
4343

4444

4545
### Sample log messages
@@ -209,9 +209,9 @@ Sumo Logic Metrics source is currently in Beta, to participate, contact your Sum
209209
In the Sumo Logic Azure Metrics source configuration,
210210

211211
- Tag the location field in the source with correct Azure resource location value. <br/><img src={useBaseUrl('img/integrations/microsoft-azure/Azure-Storage-Tag-Location.png')} alt="Azure Container Instance Tag Location" style={{border: '1px solid gray'}} width="400" />
212-
- Configure namespaces as `Microsoft.ContainerService/managedClusters`, `microsoft.kubernetes/connectedClusters`, `microsoft.kubernetesconfiguration/extensions`, `microsoft.hybridcontainerservice/provisionedClusters`. <br/><img src={useBaseUrl('img/integrations/microsoft-azure/azure-kubernetes-service-namespaces.png')} alt="Azure Container Instance Namespaces" style={{border: '1px solid gray'}} width="500" />
212+
- Configure the namespaces as `Microsoft.ContainerService/managedClusters`, `microsoft.kubernetes/connectedClusters`, `microsoft.kubernetesconfiguration/extensions`, and `microsoft.hybridcontainerservice/provisionedClusters`. <br/><img src={useBaseUrl('img/integrations/microsoft-azure/azure-kubernetes-service-namespaces.png')} alt="Azure Container Instance Namespaces" style={{border: '1px solid gray'}} width="500" />
213213

214-
### Collecting logs collection for the Azure Kubernetes Cluster
214+
### Collecting logs for the Azure Kubernetes Cluster
215215

216216
This section walks you through the process of configuring a pipeline to send logs from Azure Monitor to Sumo Logic.
217217

@@ -248,67 +248,67 @@ import AppInstall from '../../reuse/apps/app-install.md';
248248

249249
### Overview
250250

251-
The **Azure Kubernetes Service - Overview** dashboard provides insights like Audit Requests by Location, Active/Total Clusters, Clusters with API Server Errors, Clusters with Autoscaler Errors, Clusters with Kube Controller Manager Errors, Clusters with Scheduler Errors, Clusters with Cloud Control Manager Errors, Nodes Across Cluster and Critical Nodes Across Cluster.
251+
The **Azure Kubernetes Service - Overview** dashboard provides insights like Audit Requests by Location, Active/Total Clusters, Clusters with API Server Errors, Clusters with Autoscaler Errors, Clusters with Kube Controller Manager Errors, Clusters with Scheduler Errors, Clusters with Cloud Control Manager Errors, Nodes Across Cluster, and Critical Nodes Across Cluster.
252252

253253
<img src={useBaseUrl('https://sumologic-app-data-v2.s3.us-east-1.amazonaws.com/dashboards/AzureKubernetesService/Azure-Kubernetes-Service-Overview.png')} alt="Azure Kubernetes Service - Overview" />
254254

255255
### Administrative Operations
256256

257-
The **Azure Kubernetes Service - Administrative Operations** dashboard provides details like Top 10 Operations That Caused The Most Errors, Distribution by Operation Type (Read, Write and Delete), Distribution by Operations, Recent Write Operations, Recent Delete Operations, Users / Applications by Operation type, Distribution by Status.
257+
The **Azure Kubernetes Service - Administrative Operations** dashboard provides details like Top 10 Operations That Caused The Most Errors, Distribution by Operation Type (Read, Write, and Delete), Distribution by Operations, Recent Write Operations, Recent Delete Operations, Users / Applications by Operation type, and Distribution by Status.
258258

259259
<img src={useBaseUrl('https://sumologic-app-data-v2.s3.us-east-1.amazonaws.com/dashboards/AzureKubernetesService/Azure-Kubernetes-Service-Administrative-Operations.png')} alt="Azure Kubernetes Service - Administrative Operations" />
260260

261261
### Audit
262262

263-
The **Azure Kubernetes Service - Audit** dashboard provides details about Requests by Location, Failure by Operations, Failure by Stages, Failure by Reason, Distribution by Status Code, Top 10 Failed Resources, Successful Resource Details, Top 10 Users, Failure Trend by User, Failure Details.
263+
The **Azure Kubernetes Service - Audit** dashboard provides details about the Requests by Location, Failure by Operations, Failure by Stages, Failure by Reason, Distribution by Status Code, Top 10 Failed Resources, Successful Resource Details, Top 10 Users, Failure Trend by User, and Failure Details.
264264

265265
<img src={useBaseUrl('https://sumologic-app-data-v2.s3.us-east-1.amazonaws.com/dashboards/AzureKubernetesService/Azure-Kubernetes-Service-Audit.png')} alt="Azure Kubernetes Service - Audit" />
266266

267267
### Audit Admin
268268

269-
The **Azure Kubernetes Service - Audit Admin** dashboard details about Requests by Location, Failure by Operations, Failure by Stages, Failure by Reason, Distribution by Status Code, Top 10 Failed Resources, Successful Resource Details, Top 10 Users, Failure Trend by User, Failure Details.
269+
The **Azure Kubernetes Service - Audit Admin** dashboard details about the Requests by Location, Failure by Operations, Failure by Stages, Failure by Reason, Distribution by Status Code, Top 10 Failed Resources, Successful Resource Details, Top 10 Users, Failure Trend by User, and Failure Details.
270270

271271
<img src={useBaseUrl('https://sumologic-app-data-v2.s3.us-east-1.amazonaws.com/dashboards/AzureKubernetesService/Azure-Kubernetes-Service-Audit-Admin.png')} alt="Azure Kubernetes Service - Audit Admin" />
272272

273273
### API Server
274274

275-
The **Azure Kubernetes Service - API Server** dashboard provides insights about Failed Urls, Total Requests by Url, Failed Methods, Total Requests by Method, Requests by Severity, Errors by Severity and Error Log Events.
275+
The **Azure Kubernetes Service - API Server** dashboard provides insights about the Failed Urls, Total Requests by Url, Failed Methods, Total Requests by Method, Requests by Severity, Errors by Severity, and Error Log Events.
276276

277277
<img src={useBaseUrl('https://sumologic-app-data-v2.s3.us-east-1.amazonaws.com/dashboards/AzureKubernetesService/Azure-Kubernetes-Service-API-Server.png')} alt="Azure Kubernetes Service - API Server" />
278278

279279
### Cloud Control Manager
280280

281-
The **Azure Kubernetes Service - Cloud Control Manager** dashboard provides insights about Severity Breakdown, Severity Over Time, Error Message Count, Error Log Stream.
281+
The **Azure Kubernetes Service - Cloud Control Manager** dashboard provides insights about the Severity Breakdown, Severity Over Time, Error Message Count, and Error Log Stream.
282282

283283
<img src={useBaseUrl('https://sumologic-app-data-v2.s3.us-east-1.amazonaws.com/dashboards/AzureKubernetesService/Azure-Kubernetes-Service-Cloud-Control-Manager.png')} alt="Azure Kubernetes Service - Cloud Control Manager" />
284284

285285
### Cluster Autoscaler
286286

287-
The **Azure Kubernetes Service - Cluster Autoscaler** dashboard provides insights about Severity Breakdown, Severity Over Time, Error Message Count, Error Log Stream.
287+
The **Azure Kubernetes Service - Cluster Autoscaler** dashboard provides insights about the Severity Breakdown, Severity Over Time, Error Message Count, and Error Log Stream.
288288

289289
<img src={useBaseUrl('https://sumologic-app-data-v2.s3.us-east-1.amazonaws.com/dashboards/AzureKubernetesService/Azure-Kubernetes-Service-Cluster-Autoscaler.png')} alt="Azure Kubernetes Service - Cluster Autoscaler" />
290290

291291
### Controller Manager
292292

293-
The **Azure Kubernetes Service - Controller Manager** dashboard provides insights about Severity Breakdown, Severity Over Time, Error Message Count, Error Log Stream.
293+
The **Azure Kubernetes Service - Controller Manager** dashboard provides insights about the Severity Breakdown, Severity Over Time, Error Message Count, and Error Log Stream.
294294

295295
<img src={useBaseUrl('https://sumologic-app-data-v2.s3.us-east-1.amazonaws.com/dashboards/AzureKubernetesService/Azure-Kubernetes-Service-Controller-Manager.png')} alt="Azure Kubernetes Service - Controller Manager" />
296296

297297
### Policy and Recommendations
298298

299-
The **Azure Kubernetes Service - Policy and Recommendations** dashboard provides details like Total Recommendation Events, Total Success Policy Events, Total Failed Policy Events, Failed Policy Events, Recent Recommendation Events, Recommendation, Policy etc.
299+
The **Azure Kubernetes Service - Policy and Recommendations** dashboard provides details like Total Recommendation Events, Total Success Policy Events, Total Failed Policy Events, Failed Policy Events, Recent Recommendation Events, Recommendation, and Policy.
300300

301301
<img src={useBaseUrl('https://sumologic-app-data-v2.s3.us-east-1.amazonaws.com/dashboards/AzureKubernetesService/Azure-Kubernetes-Service-Policy-and-Recommendations.png')} alt="Azure Kubernetes Service - Policy and Recommendations" />
302302

303303
### Scheduler
304304

305-
The **Azure Kubernetes Service - Scheduler** dashboard provides details about Severity Over Time, Severity Breakdown and Error Messages.
305+
The **Azure Kubernetes Service - Scheduler** dashboard provides details about the Severity Over Time, Severity Breakdown, and Error Messages.
306306

307307
<img src={useBaseUrl('https://sumologic-app-data-v2.s3.us-east-1.amazonaws.com/dashboards/AzureKubernetesService/Azure-Kubernetes-Service-Scheduler.png')} alt="Azure Kubernetes Service - Scheduler" />
308308

309309
### Apiserver
310310

311-
The **Azure Kubernetes Service - Apiserver** dashboard provides insights about Average API Server CPU Usage (%), Average API Server Memory Usage (%), Average Inflight Requests Count, API Server CPU Usage (%), API Server Memory Usage (%) and Average Inflight Requests.
311+
The **Azure Kubernetes Service - Apiserver** dashboard provides insights about the Average API Server CPU Usage (%), Average API Server Memory Usage (%), Average Inflight Requests Count, API Server CPU Usage (%), API Server Memory Usage (%), and Average Inflight Requests.
312312

313313
<img src={useBaseUrl('https://sumologic-app-data-v2.s3.us-east-1.amazonaws.com/dashboards/AzureKubernetesService/Azure-Kubernetes-Service-Apiserver.png')} alt="Azure Kubernetes Service - Apiserver" />
314314

@@ -339,12 +339,12 @@ The **Azure Kubernetes Service - Node Memory** dashboard provides insights about
339339
### Azure Key Vaults alerts
340340
These alerts are metric based and will work for all Key Vaults.
341341

342-
| Alert Name | Alert Description and Conditions | Alert Condition | Recover Condition |
343-
|:------------------------------------------------------------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:------------------|:------------------|
344-
| `Azure Kubernetes Service - High CPU Usage` | This alert is triggered with critical type when cpu usage percentage when cpu usage percentage greater than 95% and triggered with warning type when greater than 85%. | percentage < 95 | percentage >= 95 |
345-
| `Azure Kubernetes Service - Unreachable Kube Node(s)` | This alert is triggered when kube node(s) unreachable count greater than 1. | Count >= 1 | Count < 1 |
346-
| `Azure Kubernetes Service - High Memory Working Set` | This alert triggers when memory working set percentage greater than 100. | percentage >= 100 | percentage < 100 |
347-
| `Azure Kubernetes Service - High Node Disk Usage` | This alert is triggered with critical type when node disk usage % is greater than 80% and trigger with type warning when greater than 70%. | percentage >= 80 | percentage < 80 |
342+
| Alert Name | Alert Description and Conditions | Alert Condition | Recover Condition |
343+
|:--|:--|:--|:--|
344+
| `Azure Kubernetes Service - High CPU Usage` | This alert is triggered when CPU usage percentage is greater than 95%. Also, a warning type alert will be triggered when CPU usage percentage is greater than 85%. | percentage >= 95 | percentage < 95 |
345+
| `Azure Kubernetes Service - Unreachable Kube Node(s)` | This alert is triggered when kube node(s) unreachable count greater than 1. | Count >= 1 | Count < 1 |
346+
| `Azure Kubernetes Service - High Memory Working Set` | This alert is triggered when memory working set is greater than 100%. | percentage >= 100 | percentage < 100 |
347+
| `Azure Kubernetes Service - High Node Disk Usage` | This alert is triggered when node disk usage is greater than 80% . Also, a warning alert will be triggered when node disk usage is greater than 70%. | percentage >= 80 | percentage < 80 |
348348

349349

350350
## Troubleshooting

0 commit comments

Comments
 (0)