feat(agent): iterative tool-loop + loop-limit (#126)

jonigl · web-flow · commit 9434b1d30861 · 2025-11-19T21:11:13.000+01:00
* feat(agent): iterative tool-loop + loop-limit (ll) (default 3)

Execute model-requested tool calls iteratively; configurable via `loop-limit` (`ll`) to avoid infinite loops.

* feat: adding loop-limit to autocomplete dic, also alphabet ordered

* chore(ux): improving some messages

* docs(agent): featuring agent mode in main README.md

* fix: adding missing arg for available tools in the follow up chat params

* docs(agent): adding agent mode quick demo made with asciicast

* chore(release): bump version to 0.22.0
diff --git a/README.md b/README.md
@@ -32,6 +32,8 @@
 - [Usage](#usage)
   - [Command-line Arguments](#command-line-arguments)
   - [Usage Examples](#usage-examples)
+  - [How Tool Calls Work](#how-tool-calls-work)
+  - ✨**NEW** [Agent Mode](#agent-mode)
 - [Interactive Commands](#interactive-commands)
   - [Tool and Server Selection](#tool-and-server-selection)
   - [Model Selection](#model-selection)
@@ -56,6 +58,7 @@ MCP Client for Ollama (`ollmcp`) is a modern, interactive terminal application (
 
 ## Features
 
+- 🤖 **Agent Mode**: Iterative tool execution when models request multiple tool calls, with a configurable loop limit to prevent infinite loops
 - 🌐 **Multi-Server Support**: Connect to multiple MCP servers simultaneously
 - 🚀 **Multiple Transport Types**: Supports STDIO, SSE, and Streamable HTTP server connections
 - ☁️ **Ollama Cloud Support**: Works seamlessly with Ollama Cloud models for tool calling, enabling access to powerful cloud-hosted models while using local MCP tools
@@ -244,24 +247,24 @@ During chat, use these commands:
 
 | Command          | Shortcut         | Description                                         |
 |------------------|------------------|-----------------------------------------------------|
+| `clear`          | `cc`             | Clear conversation history and context              |
+| `cls`            | `clear-screen`   | Clear the terminal screen                           |
+| `context`        | `c`              | Toggle context retention                            |
+| `context-info`   | `ci`             | Display context statistics                          |
 | `help`           | `h`              | Display help and available commands                 |
-| `tools`          | `t`              | Open the tool selection interface                   |
+| `human-in-loop`  | `hil`            | Toggle Human-in-the-Loop confirmations for tool execution |
+| `load-config`    | `lc`             | Load tool and model configuration from a file       |
+| `loop-limit`     | `ll`             | Set maximum iterative tool-loop iterations (Agent Mode). Default: 3 |
 | `model`          | `m`              | List and select a different Ollama model            |
 | `model-config`   | `mc`             | Configure advanced model parameters and system prompt|
-| `context`        | `c`              | Toggle context retention                            |
-| `thinking-mode`  | `tm`             | Toggle thinking mode (e.g., gpt-oss, deepseek-r1, qwen3) |
+| `quit`, `exit`, `bye`   | `q` or `Ctrl+D`  | Exit the client                                     |
+| `reload-servers` | `rs`             | Reload all MCP servers with current configuration   |
+| `reset-config`   | `rc`             | Reset configuration to defaults (all tools enabled) |
+| `save-config`    | `sc`             | Save current tool and model configuration to a file |
+| `show-metrics`   | `sm`             | Toggle performance metrics display                  |
 | `show-thinking`  | `st`             | Toggle thinking text visibility                     |
 | `show-tool-execution` | `ste`       | Toggle tool execution display visibility            |
-| `show-metrics`   | `sm`             | Toggle performance metrics display                  |
-| `human-in-loop`  | `hil`            | Toggle Human-in-the-Loop confirmations for tool execution |
-| `clear`          | `cc`             | Clear conversation history and context              |
-| `context-info`   | `ci`             | Display context statistics                          |
-| `cls`            | `clear-screen`   | Clear the terminal screen                           |
-| `save-config`    | `sc`             | Save current tool and model configuration to a file |
-| `load-config`    | `lc`             | Load tool and model configuration from a file       |
-| `reset-config`   | `rc`             | Reset configuration to defaults (all tools enabled) |
-| `reload-servers` | `rs`             | Reload all MCP servers with current configuration   |
-| `quit`, `exit`, `bye`   | `q` or `Ctrl+D`  | Exit the client                                     |
+| `tools`          | `t`              | Open the tool selection interface                   |
 
 
 ### Tool and Server Selection
@@ -621,13 +624,30 @@ For more information about Ollama Cloud, visit the [Ollama Cloud documentation](
 1. The client sends your query to Ollama with a list of available tools
 2. If Ollama decides to use a tool, the client:
    - Displays the tool execution with formatted arguments and syntax highlighting
-   - **NEW**: Shows a Human-in-the-Loop confirmation prompt (if enabled) allowing you to review and approve the tool call
+   - Shows a Human-in-the-Loop confirmation prompt (if enabled) allowing you to review and approve the tool call
    - Extracts the tool name and arguments from the model response
    - Calls the appropriate MCP server with these arguments (only if approved or HIL is disabled)
    - Shows the tool response in a structured, easy-to-read format
-   - Sends the tool result back to Ollama for final processing
+   - Sends the tool result back to Ollama
+   - If in Agent Mode, repeats the process if the model requests more tool calls
+3. Finally, the client:
    - Displays the model's final response incorporating the tool results
 
+### Agent Mode
+
+Some models may request multiple tool calls in a single conversation. The client supports an **Agent Mode** that allows for iterative tool execution:
+- When the model requests a tool call, the client executes it and sends the result back to the model
+- This process repeats until the model provides a final answer or reaches the configured loop limit
+- You can set the maximum number of iterations using the `loop-limit` (`ll`) command
+- The default loop limit is `3` to prevent infinite loops
+
+> [!NOTE]
+> If you want to prevent using Agent Mode, simply set the loop limit to `1`.
+
+#### Agent Mode Quick Demo:
+
+[![asciicast](https://asciinema.org/a/476qpEamCX9TFQt4jNEXIgHxS.svg)](https://asciinema.org/a/476qpEamCX9TFQt4jNEXIgHxS)
+
 ## Where Can I Find More MCP Servers?
 
 You can explore a collection of MCP servers in the official [MCP Servers repository](https://github.com/modelcontextprotocol/servers).
diff --git a/cli-package/pyproject.toml b/cli-package/pyproject.toml
@@ -1,6 +1,6 @@
 [project]
 name = "ollmcp"
-version = "0.21.0"
+version = "0.22.0"
 description = "CLI for MCP Client for Ollama - An easy-to-use command for interacting with Ollama through MCP"
 readme = "README.md"
 requires-python = ">=3.10"
@@ -9,7 +9,7 @@ authors = [
     {name = "Jonathan Löwenstern"}
 ]
 dependencies = [
-    "mcp-client-for-ollama==0.21.0"
+    "mcp-client-for-ollama==0.22.0"
 ]
 
 [project.scripts]
diff --git a/mcp_client_for_ollama/__init__.py b/mcp_client_for_ollama/__init__.py
@@ -1,3 +1,3 @@
 """MCP Client for Ollama package."""
 
-__version__ = "0.21.0"
+__version__ = "0.22.0"
diff --git a/mcp_client_for_ollama/client.py b/mcp_client_for_ollama/client.py
@@ -69,6 +69,8 @@ def __init__(self, model: str = DEFAULT_MODEL, host: str = DEFAULT_OLLAMA_HOST):
         self.show_tool_execution = True  # By default, show tool execution displays
         # Metrics display settings
         self.show_metrics = False  # By default, don't show metrics after each query
+        # Agent mode settings
+        self.loop_limit = 3  # Maximum follow-up tool loops per query
         self.default_configuration_status = False  # Track if default configuration was loaded successfully
 
         # Store server connection parameters for reloading
@@ -259,7 +261,8 @@ async def process_query(self, query: str) -> str:
         }
 
         # Add thinking parameter if thinking mode is enabled and model supports it
-        if await self.supports_thinking_mode():
+        supports_thinking = await self.supports_thinking_mode()
+        if supports_thinking:
             chat_params["think"] = self.thinking_mode
 
         # Initial Ollama API call with the query and available tools
@@ -286,9 +289,26 @@ async def process_query(self, query: str) -> str:
         # Update actual token count from metrics if available
         if metrics and metrics.get('eval_count'):
             self.actual_token_count += metrics['eval_count']
-        # Check if there are any tool calls in the response
-        if len(tool_calls) > 0 and self.tool_manager.get_enabled_tool_objects():
-            for tool in tool_calls:
+
+        enabled_tools = self.tool_manager.get_enabled_tool_objects()
+
+        loop_count = 0
+        pending_tool_calls = tool_calls
+
+        # Keep looping while the model requests tools and we have capacity
+        while pending_tool_calls and enabled_tools:
+            if loop_count >= self.loop_limit:
+                self.console.print(Panel(
+                    f"[yellow]Your current loop limit is set to [bold]{self.loop_limit}[/bold] and has been reached. Skipping additional tool calls.[/yellow]\n"
+                    f"You will probably want to increase this limit if your model requires more tool interactions to complete tasks.\n"
+                    f"You can change the loop limit with the [bold cyan]loop-limit[/bold cyan] command.",
+                    title="[bold]Loop Limit Reached[/bold]", border_style="yellow", expand=False
+                ))
+                break
+
+            loop_count += 1
+
+            for tool in pending_tool_calls:
                 tool_name = tool.function.name
                 tool_args = tool.function.arguments
 
@@ -338,27 +358,39 @@ async def process_query(self, query: str) -> str:
                 "model": model,
                 "messages": messages,
                 "stream": True,
+                "tools": available_tools,
                 "options": model_options
             }
 
             # Add thinking parameter if thinking mode is enabled and model supports it
-            if await self.supports_thinking_mode():
+            if supports_thinking:
                 chat_params_followup["think"] = self.thinking_mode
 
             stream = await self.ollama.chat(**chat_params_followup)
 
             # Process the streaming response with thinking mode support
-            response_text, _, followup_metrics = await self.streaming_manager.process_streaming_response(
+            followup_response, pending_tool_calls, followup_metrics = await self.streaming_manager.process_streaming_response(
                 stream,
                 thinking_mode=self.thinking_mode,
                 show_thinking=self.show_thinking,
                 show_metrics=self.show_metrics
             )
 
+            messages.append({
+                "role": "assistant",
+                "content": followup_response,
+                "tool_calls": pending_tool_calls
+            })
+
             # Update actual token count from followup metrics if available
             if followup_metrics and followup_metrics.get('eval_count'):
                 self.actual_token_count += followup_metrics['eval_count']
 
+            if followup_response:
+                response_text = followup_response
+
+            enabled_tools = self.tool_manager.get_enabled_tool_objects()
+
         if not response_text:
             self.console.print("[red]No content response received.[/red]")
             response_text = ""
@@ -458,6 +490,10 @@ async def chat_loop(self):
                     await self.toggle_show_thinking()
                     continue
 
+                if query.lower() in ['loop-limit', 'll']:
+                    await self.set_loop_limit()
+                    continue
+
                 if query.lower() in ['show-tool-execution', 'ste']:
                     self.toggle_show_tool_execution()
                     continue
@@ -565,6 +601,9 @@ def print_help(self):
             "• Type [bold]show-thinking[/bold] or [bold]st[/bold] to toggle thinking text visibility\n"
             "• Type [bold]show-metrics[/bold] or [bold]sm[/bold] to toggle performance metrics display\n\n"
 
+            "[bold cyan]Agent Mode:[/bold cyan] [bold magenta](New!)[/bold magenta]\n"
+            "• Type [bold]loop-limit[/bold] or [bold]ll[/bold] to set the maximum tool loop iterations\n\n"
+
             "[bold cyan]MCP Servers and Tools:[/bold cyan]\n"
             "• Type [bold]tools[/bold] or [bold]t[/bold] to configure tools\n"
             "• Type [bold]show-tool-execution[/bold] or [bold]ste[/bold] to toggle tool execution display\n"
@@ -671,6 +710,28 @@ def toggle_show_metrics(self):
         else:
             self.console.print("[cyan]🔇 Performance metrics will be hidden for a cleaner output.[/cyan]")
 
+    async def set_loop_limit(self):
+        """Configure the maximum number of follow-up tool loops per query."""
+        user_input = await self.get_user_input(f"Loop limit (current: {self.loop_limit})")
+
+        if user_input is None:
+            return
+
+        value = user_input.strip()
+
+        if not value:
+            self.console.print("[yellow]Loop limit unchanged.[/yellow]")
+            return
+
+        try:
+            new_limit = int(value)
+            if new_limit < 1:
+                raise ValueError
+            self.loop_limit = new_limit
+            self.console.print(f"[green]🤖 Agent loop limit set to {self.loop_limit}![/green]")
+        except ValueError:
+            self.console.print("[red]Invalid loop limit. Please enter a positive integer.[/red]")
+
     def clear_context(self):
         """Clear conversation history and token count"""
         original_history_length = len(self.chat_history)
@@ -695,6 +756,7 @@ def display_context_stats(self):
             f"{thinking_status}"
             f"Tool execution display: [{'green' if self.show_tool_execution else 'red'}]{'Enabled' if self.show_tool_execution else 'Disabled'}[/{'green' if self.show_tool_execution else 'red'}]\n"
             f"Performance metrics: [{'green' if self.show_metrics else 'red'}]{'Enabled' if self.show_metrics else 'Disabled'}[/{'green' if self.show_metrics else 'red'}]\n"
+            f"Agent loop limit: [cyan]{self.loop_limit}[/cyan]\n"
             f"Human-in-the-Loop confirmations: [{'green' if self.hil_manager.is_enabled() else 'red'}]{'Enabled' if self.hil_manager.is_enabled() else 'Disabled'}[/{'green' if self.hil_manager.is_enabled() else 'red'}]\n"
             f"Conversation entries: {history_count}\n"
             f"Total tokens generated: {self.actual_token_count:,}",
@@ -730,6 +792,9 @@ def save_configuration(self, config_name=None):
                 "thinkingMode": self.thinking_mode,
                 "showThinking": self.show_thinking
             },
+            "agentSettings": {
+                "loopLimit": self.loop_limit
+            },
             "modelConfig": self.model_config_manager.get_config(),
             "displaySettings": {
                 "showToolExecution": self.show_tool_execution,
@@ -787,6 +852,14 @@ def load_configuration(self, config_name=None):
             if "showThinking" in config_data["modelSettings"]:
                 self.show_thinking = config_data["modelSettings"]["showThinking"]
 
+        if "agentSettings" in config_data:
+            if "loopLimit" in config_data["agentSettings"]:
+                try:
+                    loop_limit = int(config_data["agentSettings"]["loopLimit"])
+                    self.loop_limit = max(1, loop_limit)
+                except (TypeError, ValueError):
+                    pass
+
         # Load model configuration if specified
         if "modelConfig" in config_data:
             self.model_config_manager.set_config(config_data["modelConfig"])
@@ -833,6 +906,17 @@ def reset_configuration(self):
                 # Default show thinking to True if not specified
                 self.show_thinking = True
 
+        if "agentSettings" in config_data:
+            if "loopLimit" in config_data["agentSettings"]:
+                try:
+                    self.loop_limit = max(1, int(config_data["agentSettings"]["loopLimit"]))
+                except (TypeError, ValueError):
+                    self.loop_limit = 3
+            else:
+                self.loop_limit = 3
+        else:
+            self.loop_limit = 3
+
         # Reset display settings from the default configuration
         if "displaySettings" in config_data:
             if "showToolExecution" in config_data["displaySettings"]:
diff --git a/mcp_client_for_ollama/config/defaults.py b/mcp_client_for_ollama/config/defaults.py
@@ -23,6 +23,9 @@ def default_config() -> dict:
             "thinkingMode": True,
             "showThinking": False
         },
+        "agentSettings": {
+            "loopLimit": 3
+        },
         "modelConfig": {
             "system_prompt": "",
             "num_keep": None,
diff --git a/mcp_client_for_ollama/config/manager.py b/mcp_client_for_ollama/config/manager.py
@@ -150,7 +150,8 @@ def reset_configuration(self) -> Dict[str, Any]:
             "• All tools enabled\n"
             "• Context retention enabled\n"
             "• Thinking mode enabled\n"
-            "• Thinking text hidden",
+            "• Thinking text hidden\n"
+            "• Agent loop limit reset",
             title="Config Reset", border_style="green", expand=False
         ))
 
@@ -211,6 +212,14 @@ def _validate_config(self, config_data: Dict[str, Any]) -> Dict[str, Any]:
             if "showThinking" in config_data["modelSettings"]:
                 validated["modelSettings"]["showThinking"] = bool(config_data["modelSettings"]["showThinking"])
 
+        if "agentSettings" in config_data and isinstance(config_data["agentSettings"], dict):
+            if "loopLimit" in config_data["agentSettings"]:
+                try:
+                    loop_limit = int(config_data["agentSettings"]["loopLimit"])
+                    validated["agentSettings"]["loopLimit"] = max(1, loop_limit)
+                except (TypeError, ValueError):
+                    pass
+
         if "modelConfig" in config_data and isinstance(config_data["modelConfig"], dict):
             model_config = config_data["modelConfig"]
             if "system_prompt" in model_config:
diff --git a/mcp_client_for_ollama/utils/constants.py b/mcp_client_for_ollama/utils/constants.py
@@ -27,26 +27,27 @@
 
 # Interactive commands and their descriptions for autocomplete
 INTERACTIVE_COMMANDS = {
-    'tools': 'Configure available tools',
-    'help': 'Show help information',
-    'model': 'Select Ollama model',
-    'model-config': 'Configure model parameters',
-    'context': 'Toggle context retention',
-    'thinking-mode': 'Toggle thinking mode',
-    'show-thinking': 'Toggle thinking visibility',
-    'show-tool-execution': 'Toggle tool execution display',
-    'show-metrics': 'Toggle performance metrics display',
+    'bye': 'Exit the application',
+    'clear-screen': 'Clear terminal screen',
     'clear': 'Clear conversation context',
     'context-info': 'Show context information',
-    'clear-screen': 'Clear terminal screen',
-    'save-config': 'Save current configuration',
-    'load-config': 'Load saved configuration',
-    'reset-config': 'Reset to default config',
-    'reload-servers': 'Reload MCP servers',
+    'context': 'Toggle context retention',
+    'exit': 'Exit the application',
+    'help': 'Show help information',
     'human-in-the-loop': 'Toggle HIL confirmations',
+    'load-config': 'Load saved configuration',
+    'loop-limit': 'Set agent max loop limit',
+    'model-config': 'Configure model parameters',
+    'model': 'Select Ollama model',
     'quit': 'Exit the application',
-    'exit': 'Exit the application',
-    'bye': 'Exit the application'
+    'reload-servers': 'Reload MCP servers',
+    'reset-config': 'Reset to default config',
+    'save-config': 'Save current configuration',
+    'show-metrics': 'Toggle performance metrics display',
+    'show-thinking': 'Toggle thinking visibility',
+    'show-tool-execution': 'Toggle tool execution display',
+    'thinking-mode': 'Toggle thinking mode',
+    'tools': 'Configure available tools'
 }
 
 # Default completion menu style (used by prompt_toolkit in interactive mode)
diff --git a/pyproject.toml b/pyproject.toml
@@ -1,6 +1,6 @@
 [project]
 name = "mcp-client-for-ollama"
-version = "0.21.0"
+version = "0.22.0"
 description = "MCP Client for Ollama - A client for connecting to Model Context Protocol servers using Ollama"
 readme = "README.md"
 requires-python = ">=3.10"
diff --git a/uv.lock b/uv.lock

Original file line number	Diff line number	Diff line change
`@@ -1,3 +1,3 @@`
`1`	`1`	`"""MCP Client for Ollama package."""`
`2`	`2`
`3`		`-__version__ = "0.21.0"`
	`3`	`+__version__ = "0.22.0"`