You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The AKS Agent extension provides the "az aks agent" command, an AI-powered assistant that helps analyze and troubleshoot Azure Kubernetes Service (AKS) clusters using Large Language Models (LLMs). The agent combines cluster context, configurable toolsets, and LLMs to answer natural-language questions about your cluster (for example, "Why are my pods not starting?") and can investigate issues in both interactive and non-interactive (batch) modes.
9
9
10
-
New in this version: **az aks agent-init** command for easy LLM model configuration!
10
+
New in this version: **az aks agent-init** command for containerized agent deployment!
11
11
12
-
You can now use `az aks agent-init` to interactively add and configure LLM models before asking questions. This command guides you through the setup process, allowing you to add multiple models as needed. When asking questions with `az aks agent`, you can:
12
+
The `az aks agent-init` command deploys the AKS agent as a Helm chart directly in your AKS cluster with enterprise-grade security:
13
13
14
-
- Use `--config-file` to specify your own model configuration file
15
-
- Use `--model` to select a previously configured model
16
-
- If neither is provided, the last configured LLM will be used by default
14
+
- **Kubernetes RBAC**: Uses cluster roles to securely access Kubernetes resources with least-privilege principles
15
+
- **Workload Identity**: Leverages Azure workload identity for secure, keyless access to Azure resources
16
+
- **Interactive LLM Configuration**: Guides you through setting up LLM models with encrypted storage in Kubernetes secrets
17
17
18
-
This makes it much easier to manage and switch between multiple models for your AKS troubleshooting workflows.
18
+
When asking questions with `az aks agent`:
19
+
20
+
- The agent automatically uses the last configured model
21
+
- Use `--model` to select a specific model when you have multiple models configured
22
+
23
+
This architecture provides better security, scalability, and manageability for production AKS troubleshooting workflows.
19
24
20
25
Key capabilities
21
26
----------------
22
27
23
28
29
+
- **Containerized Deployment**: Agent runs as a Helm chart in your AKS cluster with `az aks agent-init`.
30
+
- **Secure Access**: Uses Kubernetes RBAC for cluster resources and Azure workload identity for Azure resources.
31
+
- **LLM Configuration**: Interactively configure LLM models with credentials stored securely in Kubernetes secrets.
32
+
- Support for multiple LLM providers (Azure OpenAI, OpenAI, Anthropic, Gemini, etc.).
33
+
- Automatically uses the last configured model by default.
34
+
- Optionally use --model to select a specific model when you have multiple models configured.
24
35
- Interactive and non-interactive modes (use --no-interactive for batch runs).
25
-
- Support for multiple LLM providers (Azure OpenAI, OpenAI, etc.) via interactive configuration.
26
-
- **Easy model setup with `az aks agent-init`**: interactively add and configure LLM models, run multiple times to add more models.
27
-
- Configurable via a JSON/YAML config file provided with --config-file, or select a model with --model.
28
-
- If no config or model is specified, the last configured LLM is used automatically.
29
36
- Control echo and tool output visibility with --no-echo-request and --show-tool-output.
30
37
- Refresh the available toolsets with --refresh-toolsets.
31
-
- Stay in traditional toolset mode by default, or opt in to aks-mcp integration with ``--aks-mcp`` when you need the enhanced capabilities.
32
38
33
39
Prerequisites
34
40
-------------
@@ -37,98 +43,6 @@ For more details about supported model providers and required
The AKS Agent uses YAML configuration files to define LLM connections. Each configuration contains a provider specification and the required environment variables for that provider.
The LiteLLM provider route that determines which LLM service to use. This follows the LiteLLM provider specification from https://docs.litellm.ai/docs/providers.
62
-
63
-
Common values:
64
-
65
-
* ``azure`` - Azure OpenAI Service
66
-
* ``openai`` - OpenAI API and OpenAI-compatible APIs (e.g., local models, other services)
67
-
* ``anthropic`` - Anthropic Claude
68
-
* ``gemini`` - Google's Gemini
69
-
* ``openai_compatible`` - OpenAI-compatible APIs (e.g., local models, other services)
70
-
71
-
**MODEL_NAME**
72
-
The specific model or deployment name to use. This varies by provider:
73
-
74
-
* For Azure OpenAI: Your deployment name (e.g., ``gpt-4.1``, ``gpt-35-turbo``)
75
-
* For OpenAI: Model name (e.g., ``gpt-4``, ``gpt-3.5-turbo``)
76
-
* For other providers: Check the specific model names in LiteLLM documentation
77
-
78
-
**Environment Variables by Provider**
79
-
80
-
The remaining fields are environment variables required by each provider. These correspond to the authentication and configuration requirements of each LLM service:
81
-
82
-
**Azure OpenAI (provider: azure)**
83
-
* ``AZURE_API_KEY`` - Your Azure OpenAI API key
84
-
* ``AZURE_API_BASE`` - Your Azure OpenAI endpoint URL (e.g., https://your-resource.openai.azure.com/)
85
-
* ``AZURE_API_VERSION`` - API version (e.g., 2024-02-01, 2025-04-01-preview)
86
-
87
-
**OpenAI (provider: openai)**
88
-
* ``OPENAI_API_KEY`` - Your OpenAI API key (starts with sk-)
89
-
90
-
**Gemini (provider: gemini)**
91
-
* ``GOOGLE_API_KEY`` - Your Google Cloud API key
92
-
* ``GOOGLE_API_ENDPOINT`` - Base URL for the Gemini API endpoint
* Environment variable substitution where supported
130
-
* Separate configuration files for different environments (dev/prod)
131
-
132
46
Quick start and examples
133
47
=========================
134
48
@@ -139,14 +53,21 @@ Install the extension
139
53
140
54
az extension add --name aks-agent
141
55
142
-
Configure LLM models interactively
143
-
----------------------------------
56
+
Initialize and configure the AKS agent
57
+
---------------------------------------
144
58
145
59
.. code-block:: bash
146
60
147
-
az aks agent-init
61
+
az aks agent-init --resource-group MyResourceGroup --name MyManagedCluster
62
+
63
+
This command will configure the LLM configuration and:
148
64
149
-
This command will guide you through adding a new LLM model. You can run it multiple times to add more models or update existing models. All configured models are saved locally and can be selected when asking questions.
65
+
1. Guide you through LLM model configuration with credentials stored securely in Kubernetes secrets
66
+
2. Deploy the AKS agent Helm chart in your cluster
67
+
3. Configure Kubernetes RBAC for secure cluster resource access
68
+
4. Optionally configure Azure workload identity for Azure resource access
69
+
70
+
You can run it multiple times to update configurations or add more models.
150
71
151
72
Run the agent (Azure OpenAI example) :
152
73
-----------------------------------
@@ -163,12 +84,6 @@ Run the agent (Azure OpenAI example) :
163
84
164
85
az aks agent "Why are my pods not starting?" --name MyManagedCluster --resource-group MyResourceGroup --model azure/my-gpt4.1-deployment
165
86
166
-
**3. Use a custom config file:**
167
-
168
-
.. code-block:: bash
169
-
170
-
az aks agent "Why are my pods not starting?" --config-file /path/to/your/model_config.yaml
171
-
172
87
173
88
Run the agent (OpenAI example)
174
89
------------------------------
@@ -185,34 +100,27 @@ Run the agent (OpenAI example)
185
100
186
101
az aks agent "Why are my pods not starting?" --name MyManagedCluster --resource-group MyResourceGroup --model gpt-4o
187
102
188
-
**3. Use a custom config file:**
189
-
190
-
.. code-block:: bash
191
-
192
-
az aks agent "Why are my pods not starting?" --config-file /path/to/your/model_config.yaml
193
-
194
103
Run in non-interactive batch mode
195
104
---------------------------------
196
105
197
106
.. code-block:: bash
198
107
199
108
az aks agent "Diagnose networking issues" --no-interactive --max-steps 15 --model azure/my-gpt4.1-deployment
200
109
201
-
Opt in to MCP mode
202
-
------------------
110
+
Clean up the AKS agent
111
+
-----------------------
203
112
204
-
Traditional toolsets remain the default. Enable the aks-mcp integration when you want the enhanced toolsets by passing ``--aks-mcp``. You can return to traditional mode on a subsequent run with ``--no-aks-mcp``.
113
+
To uninstall the AKS agent and clean up all Kubernetes resources:
205
114
206
115
.. code-block:: bash
207
116
208
-
az aks agent --aks-mcp "Check node health with MCP" --name MyManagedCluster --resource-group MyResourceGroup --model azure/my-gpt4.1-deployment
117
+
az aks agent-cleanup --resource-group MyResourceGroup --name MyManagedCluster
209
118
210
-
Using a configuration file
211
-
--------------------------
119
+
This command will:
212
120
213
-
Pass a config file with --config-file to predefine model, credentials, and toolsets. See
214
-
the example config and more detailed examples in the help definition at
215
-
`src/aks-agent/azext_aks_agent/_help.py`.
121
+
1. Uninstall the AKS agent Helm chart from your cluster
0 commit comments