Monitoring, Logging, and Observability in Azure Kubernetes Service

Author: Randy Bordeaux
Date: January 2026
Version: 1.0
Azure Services: Azure Kubernetes Service (AKS), Azure Monitor, Container Insights, Log Analytics, Application Insights, Azure Managed Prometheus, Azure Managed Grafana

Executive Summary

This whitepaper provides an implementation-focused guide to monitoring, logging, and observability for Azure Kubernetes Service (AKS) in Azure Commercial environments. It defines a layered telemetry architecture covering control plane visibility, node and workload metrics, application logs, security signals, and audit evidence. The guidance assumes experienced Azure and Kubernetes engineers and uses Terraform as the authoritative infrastructure-as-code mechanism.

Monitoring, Logging, and Observability in Azure Kubernetes Service

1. Scope and Assumptions

Azure Commercial only
Azure Kubernetes Service (AKS)
Terraform (AzureRM provider) required
Private AKS clusters only
Centralized Log Analytics workspace
CI/CD-managed infrastructure and configuration

2. Observability Architecture Principles

Telemetry by default, not on-demand
Separation of metrics, logs, and traces
Central aggregation with environment isolation
Actionable alerts over raw signal volume
Evidence-ready audit trails

3. Telemetry Architecture Overview

graph TD
    AKS[AKS Cluster] --> DIAG[Diagnostic Settings]
    AKS --> INS[Container Insights]
    INS --> LA[Log Analytics]
    DIAG --> LA
    LA --> MON[Azure Monitor]
    MON --> ALERTS[Alerts / Action Groups]
    LA --> SIEM[Optional Sentinel]

Single Log Analytics workspace per environment
Diagnostic settings enabled on all supported resources
Optional SIEM integration without coupling

4. Control Plane Visibility

AKS control plane logs enabled via diagnostic settings
API server audit logs retained centrally
Correlate API calls to Entra ID identities
No reliance on ephemeral portal views

5. Node and Cluster Metrics

Container Insights enabled for baseline metrics
Node CPU, memory, disk, and network pressure
Kubelet and scheduler metrics captured
Zonal health monitored independently

6. Workload and Application Telemetry

graph TD
    App[Application Pods] --> LOGS[Application Logs]
    App --> METRICS[Custom Metrics]
    LOGS --> LA[Log Analytics]
    METRICS --> AM[Azure Monitor Metrics]

Application logs forwarded to Log Analytics
Custom metrics exposed for autoscaling and alerting
Tracing integrated at the application layer where required

7. Logging Strategy and Retention

Centralized ingestion into Log Analytics
Separate tables for control plane, workload, and security logs
Retention aligned with compliance requirements
Cost controls via table-level retention and filtering

8. Security Signals and Auditability

AKS audit logs retained and queryable
Azure Policy compliance logs captured
Entra ID sign-in and audit logs correlated
Support for incident reconstruction and forensics

9. Alerting and SLO-Based Monitoring

Alert on symptoms, not noise
SLO-driven thresholds for availability and latency
Action groups integrated with incident tooling
Avoid per-pod alerting in favor of service-level signals

10. Azure Policy and Guardrails

Require diagnostic settings on AKS and dependencies
Audit missing Container Insights
Enforce Log Analytics workspace linkage
Deny clusters without monitoring enabled

11. Terraform Implementation Patterns

resource "azurerm_monitor_diagnostic_setting" "aks" {
  name                       = "aks-diag"
  target_resource_id         = azurerm_kubernetes_cluster.aks.id
  log_analytics_workspace_id = azurerm_log_analytics_workspace.law.id

  enabled_log {
    category = "kube-audit"
  }

  metric {
    category = "AllMetrics"
    enabled  = true
  }
}

12. Tradeoffs and Limitations

High log volume requires retention discipline
Container Insights has cost implications at scale
Application-level tracing requires developer buy-in

13. Conclusion

Effective observability in AKS requires deliberate architecture across metrics, logs, and audit signals. By standardizing telemetry collection, centralizing analysis, and enforcing monitoring via Terraform and Azure Policy, teams can operate reliable, secure, and supportable Kubernetes platforms in Azure Commercial environments.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Monitoring, Logging, and Observability in Azure Kubernetes Service

Executive Summary

Table of Contents

1. Scope and Assumptions

2. Observability Architecture Principles

3. Telemetry Architecture Overview

4. Control Plane Visibility

5. Node and Cluster Metrics

6. Workload and Application Telemetry

7. Logging Strategy and Retention

8. Security Signals and Auditability

9. Alerting and SLO-Based Monitoring

10. Azure Policy and Guardrails

11. Terraform Implementation Patterns

12. Tradeoffs and Limitations

13. Conclusion

FilesExpand file tree

monitoring-logging-and-observability-aks.md

Latest commit

History

monitoring-logging-and-observability-aks.md

File metadata and controls

Monitoring, Logging, and Observability in Azure Kubernetes Service

Executive Summary

Table of Contents

1. Scope and Assumptions

2. Observability Architecture Principles

3. Telemetry Architecture Overview

4. Control Plane Visibility

5. Node and Cluster Metrics

6. Workload and Application Telemetry

7. Logging Strategy and Retention

8. Security Signals and Auditability

9. Alerting and SLO-Based Monitoring

10. Azure Policy and Guardrails

11. Terraform Implementation Patterns

12. Tradeoffs and Limitations

13. Conclusion