update doc and release history

mainred · mainred · commit de30ba8057ef · 2025-12-07T12:31:47.000Z
diff --git a/src/aks-agent/HISTORY.rst b/src/aks-agent/HISTORY.rst
@@ -12,6 +12,18 @@ To release a new version, please select a new version number (usually plus 1 to
 Pending
 +++++++
 
+1.0.0b12
+++++++++
+* [BREAKING CHANGE]:
+  * aks-agent is now containerized and deployed per Kubernetes cluster along with a managed aks-mcp instance
+  * aks-agent is deployed on the AKS cluster as Helm charts during `az aks agent-init`
+  * aks agent commands now require --resource-group and --name parameters to specify the target AKS cluster
+  * Add `az aks agent-cleanup` to cleanup the AKS agent from the cluster
+* [SECURITY]:
+  * Kubernetes RBAC: Uses cluster roles to securely access Kubernetes resources with least-privilege principles
+  * Azure Workload Identity: Supports Azure workload identity for secure, keyless access to Azure resources
+  * LLM credentials are stored securely in Kubernetes secrets with encryption at rest
+
 1.0.0b11
 ++++++++
 * Fix(agent-init): replace max_tokens with max_completion_tokens for connection check of Azure OpenAI service.
diff --git a/src/aks-agent/README.rst b/src/aks-agent/README.rst
@@ -7,28 +7,34 @@ Introduction
 
 The AKS Agent extension provides the "az aks agent" command, an AI-powered assistant that helps analyze and troubleshoot Azure Kubernetes Service (AKS) clusters using Large Language Models (LLMs). The agent combines cluster context, configurable toolsets, and LLMs to answer natural-language questions about your cluster (for example, "Why are my pods not starting?") and can investigate issues in both interactive and non-interactive (batch) modes.
 
-New in this version: **az aks agent-init** command for easy LLM model configuration!
+New in this version: **az aks agent-init** command for containerized agent deployment!
 
-You can now use `az aks agent-init` to interactively add and configure LLM models before asking questions. This command guides you through the setup process, allowing you to add multiple models as needed. When asking questions with `az aks agent`, you can:
+The `az aks agent-init` command deploys the AKS agent as a Helm chart directly in your AKS cluster with enterprise-grade security:
 
-- Use `--config-file` to specify your own model configuration file
-- Use `--model` to select a previously configured model
-- If neither is provided, the last configured LLM will be used by default
+- **Kubernetes RBAC**: Uses cluster roles to securely access Kubernetes resources with least-privilege principles
+- **Workload Identity**: Leverages Azure workload identity for secure, keyless access to Azure resources
+- **Interactive LLM Configuration**: Guides you through setting up LLM models with encrypted storage in Kubernetes secrets
 
-This makes it much easier to manage and switch between multiple models for your AKS troubleshooting workflows.
+When asking questions with `az aks agent`:
+
+- The agent automatically uses the last configured model
+- Use `--model` to select a specific model when you have multiple models configured
+
+This architecture provides better security, scalability, and manageability for production AKS troubleshooting workflows.
 
 Key capabilities
 ----------------
 
 
+- **Containerized Deployment**: Agent runs as a Helm chart in your AKS cluster with `az aks agent-init`.
+- **Secure Access**: Uses Kubernetes RBAC for cluster resources and Azure workload identity for Azure resources.
+- **LLM Configuration**: Interactively configure LLM models with credentials stored securely in Kubernetes secrets.
+- Support for multiple LLM providers (Azure OpenAI, OpenAI, Anthropic, Gemini, etc.).
+- Automatically uses the last configured model by default.
+- Optionally use --model to select a specific model when you have multiple models configured.
 - Interactive and non-interactive modes (use --no-interactive for batch runs).
-- Support for multiple LLM providers (Azure OpenAI, OpenAI, etc.) via interactive configuration.
-- **Easy model setup with `az aks agent-init`**: interactively add and configure LLM models, run multiple times to add more models.
-- Configurable via a JSON/YAML config file provided with --config-file, or select a model with --model.
-- If no config or model is specified, the last configured LLM is used automatically.
 - Control echo and tool output visibility with --no-echo-request and --show-tool-output.
 - Refresh the available toolsets with --refresh-toolsets.
-- Stay in traditional toolset mode by default, or opt in to aks-mcp integration with ``--aks-mcp`` when you need the enhanced capabilities.
 
 Prerequisites
 -------------
@@ -37,98 +43,6 @@ For more details about supported model providers and required
 variables, see: https://docs.litellm.ai/docs/providers
 
 
-LLM Configuration Explained
----------------------------
-
-The AKS Agent uses YAML configuration files to define LLM connections. Each configuration contains a provider specification and the required environment variables for that provider.
-
-Configuration Structure
-^^^^^^^^^^^^^^^^^^^^^^^^
-
-.. code-block:: yaml
-
-    llms:
-    - provider: azure
-      MODEL_NAME: gpt-4.1
-      AZURE_API_KEY: *******
-      AZURE_API_BASE: https://{azure-openai-service}.openai.azure.com/
-      AZURE_API_VERSION: 2025-04-01-preview
-
-Field Explanations
-^^^^^^^^^^^^^^^^^^
-
-**provider**
-    The LiteLLM provider route that determines which LLM service to use. This follows the LiteLLM provider specification from https://docs.litellm.ai/docs/providers.
-
-    Common values:
-
-    * ``azure`` - Azure OpenAI Service
-    * ``openai`` - OpenAI API and OpenAI-compatible APIs (e.g., local models, other services)
-    * ``anthropic`` - Anthropic Claude
-    * ``gemini`` - Google's Gemini
-    * ``openai_compatible`` - OpenAI-compatible APIs (e.g., local models, other services)
-
-**MODEL_NAME**
-    The specific model or deployment name to use. This varies by provider:
-
-    * For Azure OpenAI: Your deployment name (e.g., ``gpt-4.1``, ``gpt-35-turbo``)
-    * For OpenAI: Model name (e.g., ``gpt-4``, ``gpt-3.5-turbo``)
-    * For other providers: Check the specific model names in LiteLLM documentation
-
-**Environment Variables by Provider**
-
-The remaining fields are environment variables required by each provider. These correspond to the authentication and configuration requirements of each LLM service:
-
-**Azure OpenAI (provider: azure)**
-    * ``AZURE_API_KEY`` - Your Azure OpenAI API key
-    * ``AZURE_API_BASE`` - Your Azure OpenAI endpoint URL (e.g., https://your-resource.openai.azure.com/)
-    * ``AZURE_API_VERSION`` - API version (e.g., 2024-02-01, 2025-04-01-preview)
-
-**OpenAI (provider: openai)**
-    * ``OPENAI_API_KEY`` - Your OpenAI API key (starts with sk-)
-
-**Gemini (provider: gemini)**
-    * ``GOOGLE_API_KEY`` - Your Google Cloud API key
-    * ``GOOGLE_API_ENDPOINT`` - Base URL for the Gemini API endpoint
-
-**Anthropic (provider: anthropic)**
-    * ``ANTHROPIC_API_KEY`` - Your Anthropic API key
-
-**OpenAI Compatible (provider: openai_compatible)**
-    * ``OPENAI_API_BASE`` - Base URL for the API endpoint
-    * ``OPENAI_API_KEY`` - API key (if required by the service)
-
-Multiple Model Configuration
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
-You can configure multiple models in a single file:
-
-.. code-block:: yaml
-
-    llms:
-    - provider: azure
-      MODEL_NAME: gpt-4
-      AZURE_API_KEY: your-azure-key
-      AZURE_API_BASE: https://your-azure-endpoint.openai.azure.com/
-      AZURE_API_VERSION: 2024-02-01
-    - provider: openai
-      MODEL_NAME: gpt-4
-      OPENAI_API_KEY: your-openai-key
-    - provider: anthropic
-      MODEL_NAME: claude-3-sonnet-20240229
-      ANTHROPIC_API_KEY: your-anthropic-key
-
-When using ``--model``, specify the provider and model as ``provider/model_name`` (e.g., ``azure/gpt-4``, ``openai/gpt-4``).
-
-Security Note
-^^^^^^^^^^^^^
-
-API keys and credentials in configuration files should be kept secure. Consider using:
-
-* Restricted file permissions (``chmod 600 config.yaml``)
-* Environment variable substitution where supported
-* Separate configuration files for different environments (dev/prod)
-
 Quick start and examples
 =========================
 
@@ -139,14 +53,21 @@ Install the extension
 
     az extension add --name aks-agent
 
-Configure LLM models interactively
-----------------------------------
+Initialize and configure the AKS agent
+---------------------------------------
 
 .. code-block:: bash
 
-    az aks agent-init
+    az aks agent-init --resource-group MyResourceGroup --name MyManagedCluster
+
+This command will configure the LLM configuration and:
 
-This command will guide you through adding a new LLM model. You can run it multiple times to add more models or update existing models. All configured models are saved locally and can be selected when asking questions.
+1. Guide you through LLM model configuration with credentials stored securely in Kubernetes secrets
+2. Deploy the AKS agent Helm chart in your cluster
+3. Configure Kubernetes RBAC for secure cluster resource access
+4. Optionally configure Azure workload identity for Azure resource access
+
+You can run it multiple times to update configurations or add more models.
 
 Run the agent (Azure OpenAI example) :
 -----------------------------------
@@ -163,12 +84,6 @@ Run the agent (Azure OpenAI example) :
 
     az aks agent "Why are my pods not starting?" --name MyManagedCluster --resource-group MyResourceGroup --model azure/my-gpt4.1-deployment
 
-**3. Use a custom config file:**
-
-.. code-block:: bash
-
-    az aks agent "Why are my pods not starting?" --config-file /path/to/your/model_config.yaml
-
 
 Run the agent (OpenAI example)
 ------------------------------
@@ -185,34 +100,27 @@ Run the agent (OpenAI example)
     
     az aks agent "Why are my pods not starting?" --name MyManagedCluster --resource-group MyResourceGroup --model gpt-4o
 
-**3. Use a custom config file:**
-
-.. code-block:: bash
-
-    az aks agent "Why are my pods not starting?" --config-file /path/to/your/model_config.yaml
-
 Run in non-interactive batch mode
 ---------------------------------
 
 .. code-block:: bash
 
     az aks agent "Diagnose networking issues" --no-interactive --max-steps 15 --model azure/my-gpt4.1-deployment
 
-Opt in to MCP mode
-------------------
+Clean up the AKS agent
+-----------------------
 
-Traditional toolsets remain the default. Enable the aks-mcp integration when you want the enhanced toolsets by passing ``--aks-mcp``. You can return to traditional mode on a subsequent run with ``--no-aks-mcp``.
+To uninstall the AKS agent and clean up all Kubernetes resources:
 
 .. code-block:: bash
 
-    az aks agent --aks-mcp "Check node health with MCP" --name MyManagedCluster --resource-group MyResourceGroup --model azure/my-gpt4.1-deployment
+    az aks agent-cleanup --resource-group MyResourceGroup --name MyManagedCluster
 
-Using a configuration file
---------------------------
+This command will:
 
-Pass a config file with --config-file to predefine model, credentials, and toolsets. See
-the example config and more detailed examples in the help definition at
-`src/aks-agent/azext_aks_agent/_help.py`.
+1. Uninstall the AKS agent Helm chart from your cluster
+2. Remove all associated Kubernetes resources (deployments, pods, secrets, RBAC configurations)
+3. Clean up the LLM configuration secrets
 
 More help
 ---------
diff --git a/src/aks-agent/containerize-aks-agent.cast b/src/aks-agent/containerize-aks-agent.cast
@@ -0,0 +1,15 @@
+{"version": 2, "width": 229, "height": 61, "timestamp": 1765092638, "env": {"SHELL": "/usr/bin/zsh", "TERM": "xterm-256color"}}
+[0.224346, "o", "\u001b[1m\u001b[7m%\u001b[27m\u001b[1m\u001b[0m                                                                                                                                                                                                                                    \r \r"]
+[0.226405, "o", "\u001b]2;azureuser@qinhao-dev:~/code/azure-cli-extensions/src/aks-agent\u0007\u001b]1;..src/aks-agent\u0007"]
+[0.227876, "o", "\r\u001b[0m\u001b[27m\u001b[24m\u001b[J\u001b[01;32m➜  \u001b[36maks-agent\u001b[00m \u001b[K"]
+[0.228188, "o", "\u001b[?1h\u001b=\u001b[?2004h"]
+[0.354245, "o", "\r\r\u001b[0m\u001b[27m\u001b[24m\u001b[J\u001b[01;32m➜  \u001b[36maks-agent\u001b[00m \u001b[01;34mgit:(\u001b[31mcontainerized-aks-agent\u001b[34m) \u001b[33m✗\u001b[00m \u001b[K"]
+[2.065365, "o", "e"]
+[2.463113, "o", "\b \b"]
+[5.902503, "o", "\u001b[?2004l\r\r\n"]
+[5.903841, "o", "\u001b[1m\u001b[7m%\u001b[27m\u001b[1m\u001b[0m                                                                                                                                                                                                                                    \r \r"]
+[5.90555, "o", "\u001b]2;azureuser@qinhao-dev:~/code/azure-cli-extensions/src/aks-agent\u0007\u001b]1;..src/aks-agent\u0007"]
+[5.906844, "o", "\r\u001b[0m\u001b[27m\u001b[24m\u001b[J\u001b[01;31m➜  \u001b[36maks-agent\u001b[00m \u001b[01;34mgit:(\u001b[31mcontainerized-aks-agent\u001b[34m) \u001b[33m✗\u001b[00m \u001b[K"]
+[5.907222, "o", "\u001b[?1h\u001b=\u001b[?2004h"]
+[11.92265, "o", "\r\r\nfwd-i-search: _\u001b[K\u001b[A\u001b[30C"]
+[13.53107, "o", "\u001b[?2004l\u001b[1B\r\u001b[K\u001b[A\u001b[45C\u001b[1B\r"]
diff --git a/src/aks-agent/setup.py b/src/aks-agent/setup.py
@@ -9,7 +9,7 @@
 
 from setuptools import find_packages, setup
 
-VERSION = "1.0.0b11"
+VERSION = "1.0.0b12"
 
 CLASSIFIERS = [
     "Development Status :: 4 - Beta",
@@ -25,8 +25,7 @@
 
 DEPENDENCIES = [
     "rich==13.9.4",
-    "supabase==2.8.0",
-    "holmesgpt==0.15.0; python_version >= '3.10'",
+    "kubernetes",
 ]
 
 with open1("README.rst", "r", encoding="utf-8") as f: