Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 17 additions & 0 deletions src/aks-agent/HISTORY.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
.. :changelog:

Release History
===============

Guidance
++++++++
If there is no rush to release a new version, please just add a description of the modification under the *Pending* section.

To release a new version, please select a new version number (usually plus 1 to last patch version, X.Y.Z -> Major.Minor.Patch, more details in `\doc <https://semver.org/>`_), and then add a new section named as the new version number in this file, the content should include the new modifications and everything from the *Pending* section. Finally, update the `VERSION` variable in `setup.py` with this new version number.

Pending
+++++++

1.0.0b1
+++++++
* Add interactive AI-powered debugging tool `az aks agent`.
80 changes: 80 additions & 0 deletions src/aks-agent/README.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
Azure CLI AKS Agent Extension
===============================

Introduction
============

The AKS Agent extension provides the "az aks agent" command, an AI-powered assistant that
helps analyze and troubleshoot Azure Kubernetes Service (AKS) clusters using Large Language
Models (LLMs). The agent combines cluster context, configurable toolsets, and LLMs to answer
natural-language questions about your cluster (for example, "Why are my pods not starting?")
and can investigate issues in both interactive and non-interactive (batch) modes.

Key capabilities
----------------

- Interactive and non-interactive modes (use --no-interactive for batch runs).
- Support for multiple LLM providers (Azure OpenAI, OpenAI, etc.) via environment variables.
- Configurable via a JSON/YAML config file provided with --config-file.
- Control echo and tool output visibility with --no-echo-request and --show-tool-output.
- Refresh the available toolsets with --refresh-toolsets.

Prerequisites
-------------

Before using the agent, make sure provider-specific environment variables are set. For
example, Azure OpenAI typically requires AZURE_API_BASE, AZURE_API_VERSION, and AZURE_API_KEY,
while OpenAI requires OPENAI_API_KEY. For more details about supported providers and required
variables, see: https://docs.litellm.ai/docs/providers

Quick start and examples
========================

Install the extension
---------------------

.. code-block:: bash

az extension add --name aks-agent

Run the agent (Azure OpenAI example)
-----------------------------------

.. code-block:: bash

export AZURE_API_BASE="https://my-azureopenai-service.openai.azure.com/"
export AZURE_API_VERSION="2025-01-01-preview"
export AZURE_API_KEY="sk-xxx"

az aks agent "Why are my pods not starting?" --name MyManagedCluster --resource-group MyResourceGroup --model azure/my-gpt4.1-deployment

Run the agent (OpenAI example)
------------------------------

.. code-block:: bash

export OPENAI_API_KEY="sk-xxx"
az aks agent "Why are my pods not starting?" --name MyManagedCluster --resource-group MyResourceGroup --model gpt-4o

Run in non-interactive batch mode
---------------------------------

.. code-block:: bash

az aks agent "Diagnose networking issues" --no-interactive --max-steps 15 --model azure/my-gpt4.1-deployment

Using a configuration file
--------------------------

Pass a config file with --config-file to predefine model, credentials, and toolsets. See
the example config and more detailed examples in the help definition at
`src/aks-agent/azext_aks_agent/_help.py`.

More help
---------

For a complete list of parameters, detailed examples and help text, run:

.. code-block:: bash

az aks agent -h
36 changes: 36 additions & 0 deletions src/aks-agent/azext_aks_agent/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
# --------------------------------------------------------------------------------------------
# Copyright (c) Microsoft Corporation. All rights reserved.
# Licensed under the MIT License. See License.txt in the project root for license information.
# --------------------------------------------------------------------------------------------


from azure.cli.core import AzCommandsLoader

# pylint: disable=unused-import
import azext_aks_agent._help


class ContainerServiceCommandsLoader(AzCommandsLoader):

def __init__(self, cli_ctx=None):
from azure.cli.core.commands import CliCommandType

aks_agent_custom = CliCommandType(operations_tmpl='azext_aks_agent.custom#{}')
super().__init__(
cli_ctx=cli_ctx,
custom_command_type=aks_agent_custom,
)

def load_command_table(self, args):
super().load_command_table(args)
from azext_aks_agent.commands import load_command_table
load_command_table(self, args)
return self.command_table

def load_arguments(self, command):
super().load_arguments(command)
from azext_aks_agent._params import load_arguments
load_arguments(self, command)


COMMAND_LOADER_CLS = ContainerServiceCommandsLoader
10 changes: 10 additions & 0 deletions src/aks-agent/azext_aks_agent/_consts.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
# --------------------------------------------------------------------------------------------
# Copyright (c) Microsoft Corporation. All rights reserved.
# Licensed under the MIT License. See License.txt in the project root for license information.
# --------------------------------------------------------------------------------------------

# aks agent constants
CONST_AGENT_CONFIG_PATH_DIR_ENV_KEY = "HOLMES_CONFIGPATH_DIR"
CONST_AGENT_NAME = "AKS AGENT"
CONST_AGENT_NAME_ENV_KEY = "AGENT_NAME"
CONST_AGENT_CONFIG_FILE_NAME = "aksAgent.yaml"
106 changes: 106 additions & 0 deletions src/aks-agent/azext_aks_agent/_help.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,106 @@
# coding=utf-8
# --------------------------------------------------------------------------------------------
# Copyright (c) Microsoft Corporation. All rights reserved.
# Licensed under the MIT License. See License.txt in the project root for license information.
# --------------------------------------------------------------------------------------------

# pylint: disable=too-many-lines

from knack.help_files import helps


helps[
"aks agent"
] = """
type: command
short-summary: Run AI assistant to analyze and troubleshoot Kubernetes clusters.
long-summary: |-
This command allows you to ask questions about your Azure Kubernetes cluster and get answers using AI models.
Environment variables must be set to use the AI model, please refer to https://docs.litellm.ai/docs/providers to learn more about supported AI providers and models and required environment variables.
parameters:
- name: --name -n
type: string
short-summary: Name of the managed cluster.
- name: --resource-group -g
type: string
short-summary: Name of the resource group.
- name: --model
type: string
short-summary: Model to use for the LLM.
- name: --api-key
type: string
short-summary: API key to use for the LLM (if not given, uses environment variables AZURE_API_KEY, OPENAI_API_KEY).
- name: --config-file
type: string
short-summary: Path to configuration file.
- name: --max-steps
type: int
short-summary: Maximum number of steps the LLM can take to investigate the issue.
- name: --no-interactive
type: bool
short-summary: Disable interactive mode. When set, the agent will not prompt for input and will run in batch mode.
- name: --no-echo-request
type: bool
short-summary: Disable echoing back the question provided to AKS Agent in the output.
- name: --show-tool-output
type: bool
short-summary: Show the output of each tool that was called during the analysis.
- name: --refresh-toolsets
type: bool
short-summary: Refresh the toolsets status.

examples:
- name: Ask about pod issues in the cluster with Azure OpenAI
text: |-
export AZURE_API_BASE="https://my-azureopenai-service.openai.azure.com/"
export AZURE_API_VERSION="2025-01-01-preview"
export AZURE_API_KEY="sk-xxx"
az aks agent "Why are my pods not starting?" --name MyManagedCluster --resource-group MyResourceGroup --model azure/my-gpt4.1-deployment
- name: Ask about pod issues in the cluster with OpenAI
text: |-
export OPENAI_API_KEY="sk-xxx"
az aks agent "Why are my pods not starting?" --name MyManagedCluster --resource-group MyResourceGroup --model gpt-4o
- name: Run in interactive mode without a question
text: az aks agent "Check the pod status in my cluster" --name MyManagedCluster --resource-group MyResourceGroup --model azure/my-gpt4.1-deployment --api-key "sk-xxx"
- name: Run in non-interactive batch mode
text: az aks agent "Diagnose networking issues" --no-interactive --max-steps 15 --model azure/my-gpt4.1-deployment
- name: Show detailed tool output during analysis
text: az aks agent "Why is my service workload unavailable in namespace workload-ns?" --show-tool-output --model azure/my-gpt4.1-deployment
- name: Use custom configuration file
text: az aks agent "Check kubernetes pod resource usage" --config-file /path/to/custom.yaml --model azure/my-gpt4.1-deployment
- name: Run agent with no echo of the original question
text: az aks agent "What is the status of my cluster?" --no-echo-request --model azure/my-gpt4.1-deployment
- name: Refresh toolsets to get the latest available tools
text: az aks agent "What is the status of my cluster?" --refresh-toolsets --model azure/my-gpt4.1-deployment
- name: Run agent with config file
text: |
az aks agent "Check kubernetes pod resource usage" --config-file /path/to/custom.yaml
Here is an example of config file:
```json
model: "gpt-4o"
api_key: "..."
# define a list of mcp servers, mcp server can be defined
mcp_servers:
aks_mcp:
description: "The AKS-MCP is a Model Context Protocol (MCP) server that enables AI assistants to interact with Azure Kubernetes Service (AKS) clusters"
url: "http://localhost:8003/sse"

# try adding your own tools or toggle the built-in toolsets here
# e.g. query company-specific data, fetch logs from your existing observability tools, etc
# To check how to add a customized toolset, please refer to https://docs.robusta.dev/master/configuration/holmesgpt/custom_toolsets.html#custom-toolsets
# To find all built-in toolsets, please refer to https://docs.robusta.dev/master/configuration/holmesgpt/builtin_toolsets.html
toolsets:
# add a new json processor toolset
json_processor:
description: "A toolset for processing JSON data using jq"
prerequisites:
- command: "jq --version" # Ensure jq is installed
tools:
- name: "process_json"
description: "A tool that uses jq to process JSON input"
command: "echo '{{ json_input }}' | jq '.'" # Example jq command to format JSON
# disable a built-in toolsets
aks/core:
enabled: false
```
"""
79 changes: 79 additions & 0 deletions src/aks-agent/azext_aks_agent/_params.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
# --------------------------------------------------------------------------------------------
# Copyright (c) Microsoft Corporation. All rights reserved.
# Licensed under the MIT License. See License.txt in the project root for license information.
# --------------------------------------------------------------------------------------------

# pylint: disable=too-many-statements,too-many-lines
import os.path

from azure.cli.core.api import get_config_dir

from azext_aks_agent._consts import CONST_AGENT_CONFIG_FILE_NAME

from azext_aks_agent._validators import validate_agent_config_file


def load_arguments(self, _):
with self.argument_context("aks agent") as c:
c.positional(
"prompt",
help="Ask any question and answer using available tools.",
)
c.argument(
"resource_group_name",
options_list=["--resource-group", "-g"],
help="Name of resource group.",
required=False,
)
c.argument(
"name",
options_list=["--name", "-n"],
help="Name of the managed cluster.",
required=False,
)
c.argument(
"max_steps",
type=int,
default=10,
required=False,
help="Maximum number of steps the LLM can take to investigate the issue.",
)
c.argument(
"config_file",
default=os.path.join(get_config_dir(), CONST_AGENT_CONFIG_FILE_NAME),
validator=validate_agent_config_file,
required=False,
help="Path to the config file.",
)
c.argument(
"model",
help="The model to use for the LLM.",
required=False,
type=str,
)
c.argument(
"api-key",
help="API key to use for the LLM (if not given, uses environment variables AZURE_API_KEY, OPENAI_API_KEY)",
required=False,
type=str,
)
c.argument(
"no_interactive",
help="Disable interactive mode. When set, the agent will not prompt for input and will run in batch mode.",
action="store_true",
)
c.argument(
"no_echo_request",
help="Disable echoing back the question provided to AKS Agent in the output.",
action="store_true",
)
c.argument(
"show_tool_output",
help="Show the output of each tool that was called.",
action="store_true",
)
c.argument(
"refresh_toolsets",
help="Refresh the toolsets status.",
action="store_true",
)
53 changes: 53 additions & 0 deletions src/aks-agent/azext_aks_agent/_validators.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
# --------------------------------------------------------------------------------------------
# Copyright (c) Microsoft Corporation. All rights reserved.
# Licensed under the MIT License. See License.txt in the project root for license information.
# --------------------------------------------------------------------------------------------

from __future__ import unicode_literals

import os
import os.path

import yaml
from azext_aks_agent._consts import CONST_AGENT_CONFIG_FILE_NAME
from azure.cli.core.api import get_config_dir
from azure.cli.core.azclierror import InvalidArgumentValueError
from knack.log import get_logger

logger = get_logger(__name__)


def _validate_param_yaml_file(yaml_path, param_name):
if not yaml_path:
return
if not os.path.exists(yaml_path):
raise InvalidArgumentValueError(
f"--{param_name}={yaml_path}: file is not found."
)
if not os.access(yaml_path, os.R_OK):
raise InvalidArgumentValueError(
f"--{param_name}={yaml_path}: file is not readable."
)
try:
with open(yaml_path, "r") as file:
yaml.safe_load(file)
except yaml.YAMLError as e:
raise InvalidArgumentValueError(
f"--{param_name}={yaml_path}: file is not a valid YAML file: {e}"
)
except Exception as e:
raise InvalidArgumentValueError(
f"--{param_name}={yaml_path}: An error occurred while reading the config file: {e}"
)


def validate_agent_config_file(namespace):
config_file = namespace.config_file
if not config_file:
return
# default config file path can be empty
default_config_path = os.path.join(get_config_dir(), CONST_AGENT_CONFIG_FILE_NAME)
if config_file == default_config_path and not os.path.exists(config_file):
return

_validate_param_yaml_file(config_file, "config-file")
Empty file.
Loading
Loading