Azure · yanzhudd · Sep 1, 2025 · Aug 29, 2025 · Aug 29, 2025 · Aug 29, 2025
@@ -0,0 +1,17 @@
+.. :changelog:
+
+Release History
+===============
+
+Guidance
+++++++++
+If there is no rush to release a new version, please just add a description of the modification under the *Pending* section.
+
+To release a new version, please select a new version number (usually plus 1 to last patch version, X.Y.Z -> Major.Minor.Patch, more details in `\doc <https://semver.org/>`_), and then add a new section named as the new version number in this file, the content should include the new modifications and everything from the *Pending* section. Finally, update the `VERSION` variable in `setup.py` with this new version number.
+
+Pending
++++++++
+
+1.0.0b1
++++++++
+* Add interactive AI-powered debugging tool `az aks agent`.
@@ -0,0 +1,80 @@
+Azure CLI AKS Agent Extension
+===============================
+
+Introduction
+============
+
+The AKS Agent extension provides the "az aks agent" command, an AI-powered assistant that
+helps analyze and troubleshoot Azure Kubernetes Service (AKS) clusters using Large Language
+Models (LLMs). The agent combines cluster context, configurable toolsets, and LLMs to answer
+natural-language questions about your cluster (for example, "Why are my pods not starting?")
+and can investigate issues in both interactive and non-interactive (batch) modes.
+
+Key capabilities
+----------------
+
+- Interactive and non-interactive modes (use --no-interactive for batch runs).
+- Support for multiple LLM providers (Azure OpenAI, OpenAI, etc.) via environment variables.
+- Configurable via a JSON/YAML config file provided with --config-file.
+- Control echo and tool output visibility with --no-echo-request and --show-tool-output.
+- Refresh the available toolsets with --refresh-toolsets.
+
+Prerequisites
+-------------
+
+Before using the agent, make sure provider-specific environment variables are set. For
+example, Azure OpenAI typically requires AZURE_API_BASE, AZURE_API_VERSION, and AZURE_API_KEY,
+while OpenAI requires OPENAI_API_KEY. For more details about supported providers and required
+variables, see: https://docs.litellm.ai/docs/providers
+
+Quick start and examples
+========================
+
+Install the extension
+---------------------
+
+.. code-block:: bash
+
+    az extension add --name aks-agent
+
+Run the agent (Azure OpenAI example)
+-----------------------------------
+
+.. code-block:: bash
+
+    export AZURE_API_BASE="https://my-azureopenai-service.openai.azure.com/"
+    export AZURE_API_VERSION="2025-01-01-preview"
+    export AZURE_API_KEY="sk-xxx"
+
+    az aks agent "Why are my pods not starting?" --name MyManagedCluster --resource-group MyResourceGroup --model azure/my-gpt4.1-deployment
+
+Run the agent (OpenAI example)
+------------------------------
+
+.. code-block:: bash
+
+    export OPENAI_API_KEY="sk-xxx"
+    az aks agent "Why are my pods not starting?" --name MyManagedCluster --resource-group MyResourceGroup --model gpt-4o
+
+Run in non-interactive batch mode
+---------------------------------
+
+.. code-block:: bash
+
+    az aks agent "Diagnose networking issues" --no-interactive --max-steps 15 --model azure/my-gpt4.1-deployment
+
+Using a configuration file
+--------------------------
+
+Pass a config file with --config-file to predefine model, credentials, and toolsets. See
+the example config and more detailed examples in the help definition at
+`src/aks-agent/azext_aks_agent/_help.py`.
+
+More help
+---------
+
+For a complete list of parameters, detailed examples and help text, run:
+
+.. code-block:: bash
+
+    az aks agent -h
@@ -0,0 +1,36 @@
+# --------------------------------------------------------------------------------------------
+# Copyright (c) Microsoft Corporation. All rights reserved.
+# Licensed under the MIT License. See License.txt in the project root for license information.
+# --------------------------------------------------------------------------------------------
+
+
+from azure.cli.core import AzCommandsLoader
+
+# pylint: disable=unused-import
+import azext_aks_agent._help
+
+
+class ContainerServiceCommandsLoader(AzCommandsLoader):
+
+    def __init__(self, cli_ctx=None):
+        from azure.cli.core.commands import CliCommandType
+
+        aks_agent_custom = CliCommandType(operations_tmpl='azext_aks_agent.custom#{}')
+        super().__init__(
+            cli_ctx=cli_ctx,
+            custom_command_type=aks_agent_custom,
+        )
+
+    def load_command_table(self, args):
+        super().load_command_table(args)
+        from azext_aks_agent.commands import load_command_table
+        load_command_table(self, args)
+        return self.command_table
+
+    def load_arguments(self, command):
+        super().load_arguments(command)
+        from azext_aks_agent._params import load_arguments
+        load_arguments(self, command)
+
+
+COMMAND_LOADER_CLS = ContainerServiceCommandsLoader
@@ -0,0 +1,10 @@
+# --------------------------------------------------------------------------------------------
+# Copyright (c) Microsoft Corporation. All rights reserved.
+# Licensed under the MIT License. See License.txt in the project root for license information.
+# --------------------------------------------------------------------------------------------
+
+# aks agent constants
+CONST_AGENT_CONFIG_PATH_DIR_ENV_KEY = "HOLMES_CONFIGPATH_DIR"
+CONST_AGENT_NAME = "AKS AGENT"
+CONST_AGENT_NAME_ENV_KEY = "AGENT_NAME"
+CONST_AGENT_CONFIG_FILE_NAME = "aksAgent.yaml"
@@ -0,0 +1,106 @@
+# coding=utf-8
+# --------------------------------------------------------------------------------------------
+# Copyright (c) Microsoft Corporation. All rights reserved.
+# Licensed under the MIT License. See License.txt in the project root for license information.
+# --------------------------------------------------------------------------------------------
+
+# pylint: disable=too-many-lines
+
+from knack.help_files import helps
+
+
+helps[
+    "aks agent"
+] = """
+    type: command
+    short-summary: Run AI assistant to analyze and troubleshoot Kubernetes clusters.
+    long-summary: |-
+      This command allows you to ask questions about your Azure Kubernetes cluster and get answers using AI models.
+      Environment variables must be set to use the AI model, please refer to https://docs.litellm.ai/docs/providers to learn more about supported AI providers and models and required environment variables.
+    parameters:
+        - name: --name -n
+          type: string
+          short-summary: Name of the managed cluster.
+        - name: --resource-group -g
+          type: string
+          short-summary: Name of the resource group.
+        - name: --model
+          type: string
+          short-summary: Model to use for the LLM.
+        - name: --api-key
+          type: string
+          short-summary: API key to use for the LLM (if not given, uses environment variables AZURE_API_KEY, OPENAI_API_KEY).
+        - name: --config-file
+          type: string
+          short-summary: Path to configuration file.
+        - name: --max-steps
+          type: int
+          short-summary: Maximum number of steps the LLM can take to investigate the issue.
+        - name: --no-interactive
+          type: bool
+          short-summary: Disable interactive mode. When set, the agent will not prompt for input and will run in batch mode.
+        - name: --no-echo-request
+          type: bool
+          short-summary: Disable echoing back the question provided to AKS Agent in the output.
+        - name: --show-tool-output
+          type: bool
+          short-summary: Show the output of each tool that was called during the analysis.
+        - name: --refresh-toolsets
+          type: bool
+          short-summary: Refresh the toolsets status.
+
+    examples:
+        - name: Ask about pod issues in the cluster with Azure OpenAI
+          text: |-
+            export AZURE_API_BASE="https://my-azureopenai-service.openai.azure.com/"
+            export AZURE_API_VERSION="2025-01-01-preview"
+            export AZURE_API_KEY="sk-xxx"
+            az aks agent "Why are my pods not starting?" --name MyManagedCluster --resource-group MyResourceGroup --model azure/my-gpt4.1-deployment
+        - name: Ask about pod issues in the cluster with OpenAI
+          text: |-
+            export OPENAI_API_KEY="sk-xxx"
+            az aks agent "Why are my pods not starting?" --name MyManagedCluster --resource-group MyResourceGroup --model gpt-4o
+        - name: Run in interactive mode without a question
+          text: az aks agent "Check the pod status in my cluster" --name MyManagedCluster --resource-group MyResourceGroup --model azure/my-gpt4.1-deployment --api-key "sk-xxx"
+        - name: Run in non-interactive batch mode
+          text: az aks agent "Diagnose networking issues" --no-interactive --max-steps 15 --model azure/my-gpt4.1-deployment
+        - name: Show detailed tool output during analysis
+          text: az aks agent "Why is my service workload unavailable in namespace workload-ns?" --show-tool-output --model azure/my-gpt4.1-deployment
+        - name: Use custom configuration file
+          text: az aks agent "Check kubernetes pod resource usage" --config-file /path/to/custom.yaml --model azure/my-gpt4.1-deployment
+        - name: Run agent with no echo of the original question
+          text: az aks agent "What is the status of my cluster?" --no-echo-request --model azure/my-gpt4.1-deployment
+        - name: Refresh toolsets to get the latest available tools
+          text: az aks agent "What is the status of my cluster?" --refresh-toolsets --model azure/my-gpt4.1-deployment
+        - name: Run agent with config file
+          text: |
+            az aks agent "Check kubernetes pod resource usage" --config-file /path/to/custom.yaml
+            Here is an example of config file:
+            ```json
+            model: "gpt-4o"
+            api_key: "..."
+            # define a list of mcp servers, mcp server can be defined
+            mcp_servers:
+              aks_mcp:
+                description: "The AKS-MCP is a Model Context Protocol (MCP) server that enables AI assistants to interact with Azure Kubernetes Service (AKS) clusters"
+                url: "http://localhost:8003/sse"
+
+            # try adding your own tools or toggle the built-in toolsets here
+            # e.g. query company-specific data, fetch logs from your existing observability tools, etc
+            # To check how to add a customized toolset, please refer to https://docs.robusta.dev/master/configuration/holmesgpt/custom_toolsets.html#custom-toolsets
+            # To find all built-in toolsets, please refer to https://docs.robusta.dev/master/configuration/holmesgpt/builtin_toolsets.html
+            toolsets:
+              # add a new json processor toolset
+              json_processor:
+                description: "A toolset for processing JSON data using jq"
+                prerequisites:
+                  - command: "jq --version"  # Ensure jq is installed
+                tools:
+                  - name: "process_json"
+                    description: "A tool that uses jq to process JSON input"
+                    command: "echo '{{ json_input }}' | jq '.'"  # Example jq command to format JSON
+              # disable a built-in toolsets
+              aks/core:
+                enabled: false
+              ```
+"""
@@ -0,0 +1,79 @@
+# --------------------------------------------------------------------------------------------
+# Copyright (c) Microsoft Corporation. All rights reserved.
+# Licensed under the MIT License. See License.txt in the project root for license information.
+# --------------------------------------------------------------------------------------------
+
+# pylint: disable=too-many-statements,too-many-lines
+import os.path
+
+from azure.cli.core.api import get_config_dir
+
+from azext_aks_agent._consts import CONST_AGENT_CONFIG_FILE_NAME
+
+from azext_aks_agent._validators import validate_agent_config_file
+
+
+def load_arguments(self, _):
+    with self.argument_context("aks agent") as c:
+        c.positional(
+            "prompt",
+            help="Ask any question and answer using available tools.",
+        )
+        c.argument(
+            "resource_group_name",
+            options_list=["--resource-group", "-g"],
+            help="Name of resource group.",
+            required=False,
+        )
+        c.argument(
+            "name",
+            options_list=["--name", "-n"],
+            help="Name of the managed cluster.",
+            required=False,
+        )
+        c.argument(
+            "max_steps",
+            type=int,
+            default=10,
+            required=False,
+            help="Maximum number of steps the LLM can take to investigate the issue.",
+        )
+        c.argument(
+            "config_file",
+            default=os.path.join(get_config_dir(), CONST_AGENT_CONFIG_FILE_NAME),
+            validator=validate_agent_config_file,
+            required=False,
+            help="Path to the config file.",
+        )
+        c.argument(
+            "model",
+            help="The model to use for the LLM.",
+            required=False,
+            type=str,
+        )
+        c.argument(
+            "api-key",
+            help="API key to use for the LLM (if not given, uses environment variables AZURE_API_KEY, OPENAI_API_KEY)",
+            required=False,
+            type=str,
+        )
+        c.argument(
+            "no_interactive",
+            help="Disable interactive mode. When set, the agent will not prompt for input and will run in batch mode.",
+            action="store_true",
+        )
+        c.argument(
+            "no_echo_request",
+            help="Disable echoing back the question provided to AKS Agent in the output.",
+            action="store_true",
+        )
+        c.argument(
+            "show_tool_output",
+            help="Show the output of each tool that was called.",
+            action="store_true",
+        )
+        c.argument(
+            "refresh_toolsets",
+            help="Refresh the toolsets status.",
+            action="store_true",
+        )
@@ -0,0 +1,53 @@
+# --------------------------------------------------------------------------------------------
+# Copyright (c) Microsoft Corporation. All rights reserved.
+# Licensed under the MIT License. See License.txt in the project root for license information.
+# --------------------------------------------------------------------------------------------
+
+from __future__ import unicode_literals
+
+import os
+import os.path
+
+import yaml
+from azext_aks_agent._consts import CONST_AGENT_CONFIG_FILE_NAME
+from azure.cli.core.api import get_config_dir
+from azure.cli.core.azclierror import InvalidArgumentValueError
+from knack.log import get_logger
+
+logger = get_logger(__name__)
+
+
+def _validate_param_yaml_file(yaml_path, param_name):
+    if not yaml_path:
+        return
+    if not os.path.exists(yaml_path):
+        raise InvalidArgumentValueError(
+            f"--{param_name}={yaml_path}: file is not found."
+        )
+    if not os.access(yaml_path, os.R_OK):
+        raise InvalidArgumentValueError(
+            f"--{param_name}={yaml_path}: file is not readable."
+        )
+    try:
+        with open(yaml_path, "r") as file:
+            yaml.safe_load(file)
+    except yaml.YAMLError as e:
+        raise InvalidArgumentValueError(
+            f"--{param_name}={yaml_path}: file is not a valid YAML file: {e}"
+        )
+    except Exception as e:
+        raise InvalidArgumentValueError(
+            f"--{param_name}={yaml_path}: An error occurred while reading the config file: {e}"
+        )
+
+
+def validate_agent_config_file(namespace):
+    config_file = namespace.config_file
+    if not config_file:
+        return
+    # default config file path can be empty
+    default_config_path = os.path.join(get_config_dir(), CONST_AGENT_CONFIG_FILE_NAME)
+    if config_file == default_config_path and not os.path.exists(config_file):
+        return
+
+    _validate_param_yaml_file(config_file, "config-file")