Add Reasoning Effort to MCTS Wrapper (#1)

bearlike · bearlike · commit b2e421d42dba · 2025-03-30T23:59:45.000-07:00
* Add reasoning effort to response model

* Configure MCTS Parameters by Reasoning Effort

Set iteration, simulation, and child limits in MCTSAgent per reasoning effort: NORMAL (2-2-2), MEDIUM (3-3-3), and HIGH (4-4-4). This ensures that each query goes through the minimum required iterations even when the initial score is high.

Update Dockerfile

* Update Docs

* Update prompts

Update screenshot_1.png
Update README.md

* Add workflow to build

Fix: remove unnecessary packages
diff --git a/.github/workflows/docker-publish.yml b/.github/workflows/docker-publish.yml
@@ -0,0 +1,62 @@
+name: Build and Push Docker Image
+
+on:
+  release:
+    types: [published]
+  workflow_dispatch:
+    inputs:
+      branch:
+        description: "Branch to build"
+        required: true
+        default: "main"
+      tag:
+        description: "Tag for the Docker image"
+        required: true
+        default: "latest"
+
+jobs:
+  build:
+    runs-on: ubuntu-latest
+    steps:
+      - name: Determine checkout reference
+        id: checkout-ref
+        run: |
+          if [ "${{ github.event_name }}" = "workflow_dispatch" ]; then
+            echo "ref=${{ github.event.inputs.branch }}" >> $GITHUB_OUTPUT
+          else
+            echo "ref=${{ github.event.release.target_commitish }}" >> $GITHUB_OUTPUT
+          fi
+
+      - name: Checkout code
+        uses: actions/checkout@v4
+        with:
+          ref: ${{ steps.checkout-ref.outputs.ref }}
+
+      - name: Set up Docker Buildx
+        uses: docker/setup-buildx-action@v3
+
+      - name: Log in to GitHub Container Registry
+        uses: docker/login-action@v2
+        with:
+          registry: ghcr.io
+          username: ${{ github.actor }}
+          password: ${{ secrets.GITHUB_TOKEN }}
+
+      - name: Set up Docker image tags
+        id: prep
+        run: |
+          if [ "${{ github.event_name }}" = "release" ]; then
+            echo "image_tag=${{ github.event.release.tag_name }}" >> $GITHUB_OUTPUT
+          else
+            echo "image_tag=${{ github.event.inputs.tag }}" >> $GITHUB_OUTPUT
+          fi
+
+      - name: Build and push Docker image
+        uses: docker/build-push-action@v6
+        with:
+          context: .
+          push: true
+          tags: ghcr.io/${{ github.repository_owner }}/mcts-openai-api:${{ steps.prep.outputs.image_tag }}
+          cache-from: type=gha
+          cache-to: type=gha,mode=max
+          platforms: linux/amd64,linux/arm64
diff --git a/.gitignore b/.gitignore
@@ -160,3 +160,4 @@ cython_debug/
 #.idea/
 .python-version
 output.md
+test.sh
diff --git a/Dockerfile b/Dockerfile
@@ -1,48 +1,23 @@
 FROM python:3.13-slim
 
+ARG TARGETPLATFORM=linux/amd64
+ARG DEBIAN_FRONTEND=noninteractive
+ARG LANG=C.UTF-8
+
 # Install system dependencies
 RUN apt-get update && apt-get install -y \
     wget \
     netcat-traditional \
     gnupg \
     curl \
     unzip \
-    xvfb \
-    libgconf-2-4 \
-    libxss1 \
-    libnss3 \
-    libnspr4 \
-    libasound2 \
-    libatk1.0-0 \
-    libatk-bridge2.0-0 \
-    libcups2 \
-    libdbus-1-3 \
-    libdrm2 \
-    libgbm1 \
-    libgtk-3-0 \
-    libxcomposite1 \
-    libxdamage1 \
-    libxfixes3 \
-    libxrandr2 \
-    xdg-utils \
-    fonts-liberation \
-    dbus \
-    xauth \
-    xvfb \
-    x11vnc \
-    tigervnc-tools \
     supervisor \
     net-tools \
     procps \
     git \
-    python3-numpy \
-    fontconfig \
-    fonts-dejavu \
-    fonts-dejavu-core \
-    fonts-dejavu-extra
+    && rm -rf /var/lib/apt/lists/*
 
 # Set platform for ARM64 compatibility
-ARG TARGETPLATFORM=linux/amd64
 ENV OPENAI_BASE_URL="https://api.openai.com/v1"
 ENV OPENAI_API_KEY="sk-XXX"
 
@@ -54,7 +29,6 @@ COPY scripts/requirements.txt .
 RUN pip install --no-cache-dir -r requirements.txt
 
 COPY . .
-RUN rm -rf /var/lib/apt/lists/*
 
 # Set up supervisor configuration
 RUN mkdir -p /var/log/supervisor
diff --git a/README.md b/README.md
@@ -1,28 +1,53 @@
 # MCTS OpenAI API Wrapper
 
-![Comparison of Response](docs/screenshot_1.png)
+[![Packages](https://img.shields.io/badge/Docker-ghcr.io%2Fbearlike%2Fmcts%E2%80%94openai%E2%80%94api%3Alatest-blue?logo=docker)](https://github.com/bearlike/mcts-openai-api/pkgs/container/mcts-openai-api)
 
-Monte Carlo Tree Search (MCTS) is a method that uses extra compute to explore different candidate responses before selecting a final answer. It works by building a tree of options and running multiple iterations. This is similar in concept to inference scaling, but here a model generates several output candidates, reitereates and picks the best one. Every incoming request is wrapped with a MCTS pipeline to iteratively refine language model outputs.
+<p align="center">
+   <img src="docs/screenshot_1.png" alt="Comparison of Response" style="height: 512px;">
+</p>
+
+Monte Carlo Tree Search (MCTS) is a heuristic search algorithm that systematically explores a tree of candidate outputs to refine language model responses. Upon receiving an input, the MCTS pipeline generates multiple candidate answers through iterative simulations. In each iteration, the algorithm evaluates and updates these candidates based on feedback, propagating the best scores upward. This process enhances inference by scaling the model's reasoning capabilities, enabling the selection of the optimal response from multiple candidates.
 
 ## Overview
 
 This FastAPI server exposes two endpoints:
 
 | Method | Endpoint               | Description                                                                   |
-|--------|------------------------|-------------------------------------------------------------------------------|
+| ------ | ---------------------- | ----------------------------------------------------------------------------- |
 | POST   | `/v1/chat/completions` | Accepts chat completion requests. The call is wrapped with an MCTS refinement |
-| GET    | `/v1/models`           | Proxies a request to the underlying LLM provider’s models endpoint             |
+| GET    | `/v1/models`           | Proxies a request to the underlying LLM provider’s models endpoint            |
 
-During a chat completion call, the server executes an MCTS pipeline that generates intermediate updates (including a Mermaid diagram and iteration details). All these intermediate responses are aggregated into a single `<details>` block, and the final answer is appended at the end, following a consistent and structured markdown template.
+During a chat completion call, the server runs an MCTS pipeline that produces iterative updates. Each update includes a dynamic Mermaid diagram and detailed logs of the iteration process. All intermediate responses are combined into a single `<details>` block. Finally, the final answer is appended at the end using a consistent, structured markdown template.
 
 ## Getting Started
 
-### Prerequisites
+### Deploy using Docker (Recommended) 🐳
+
+1. Create a `secrets.env` with the variables from the `docker-compose.yml` file.
+2. Use this command to pull the image and deploy the application with Docker Compose:
+
+    ```bash
+    docker pull ghcr.io/bearlike/mcts-openai-api:latest
+    docker compose --env-file secrets.env up -d
+
+    # Go to http://hostname:8426/docs for Swagger API docs and test the endpoints.
+    ```
+
+3. Use `http://hostname:8426/v1` as the OpenAI Base URL with any API key in any compatible application.
+
+---
+
+<details>
+<summary>Expand to view <code>Manual Installation</code></summary>
 
-- Python 3.8+
+### Manual Installation
+
+#### Prerequisites
+
+- Python 3.13+
 - [Poetry](https://python-poetry.org) for dependency management
 
-### Setup
+#### Setup
 
 1. **Clone the repository:**
 
@@ -54,10 +79,15 @@ During a chat completion call, the server executes an MCTS pipeline that generat
    Start the FastAPI server with Uvicorn:
 
    ```bash
-   # Visit http://server-ip:8000/docs to view the Swagger API documentation
+   # Visit http://mcts-server:8000/docs to view the Swagger API documentation
    uvicorn main:app --reload
    ```
 
+
+</details>
+
+---
+
 ## Testing the Server
 
 You can test the server using `curl` or any HTTP client.
@@ -88,23 +118,27 @@ This request will return a JSON response with the aggregated intermediate respon
 
 ## Endpoints
 
-### POST /v1/chat/completions
+### POST `/v1/chat/completions`
 
-- **Description:**
-  Wraps a chat completion request in an MCTS pipeline that refines the answer by generating intermediate updates and a final response.
+Wraps a chat completion request in an MCTS pipeline that refines the answer by generating intermediate updates and a final response.
 
-- **Request Body Parameters:**
+| Parameter        | Data Type          | Default  | Description                                                                                                 |
+| ---------------- | ------------------ | -------- | ----------------------------------------------------------------------------------------------------------- |
+| model            | string (required)  | N/A      | e.g., `gpt-4o-mini`.                                                                                        |
+| messages         | array (required)   | N/A      | Array of chat messages with `role` and `content`.                                                           |
+| max_tokens       | number (optional)  | N/A      | Maximum tokens allowed in each step response.                                                               |
+| temperature      | number (optional)  | `0.7`    | Controls the randomness of the output.                                                                      |
+| stream           | boolean (optional) | `false`  | If false, aggregates streamed responses and returns on completion. If true, streams intermediate responses. |
+| reasoning_effort | string (optional)  | `normal` | Controls the `MCTSAgent` search settings:                                                                   |
+| =>               | =>                 | =>       | **`normal`** - 2 iterations, 2 simulations per iteration, and 2 child nodes per parent (default).           |
+| =>               | =>                 | =>       | `medium` - 3 iterations, 3 simulations per iteration, and 3 child nodes per parent.                         |
+| =>               | =>                 | =>       | `high` - 4 iterations, 4 simulations per iteration, and 4 child nodes per parent.                           |
 
-    - `model`: string (e.g., `"gpt-4o-mini"`)
-    - `messages`: an array of chat messages (with `role` and `content` properties)
-    - `max_tokens`: (optional) number
-    - `temperature`: (optional) number
-    - `stream`: (optional) boolean (if enabled, aggregates intermediate responses with the final answer in one JSON response)
+### GET `/v1/models`
 
-### GET /v1/models
+Proxies requests to list available models from the underlying LLM provider using the `OPENAI_API_BASE_URL`.
 
-- **Description:**
-  Proxies requests to list available models from the underlying LLM provider using the `OPENAI_API_BASE_URL`.
+---
 
 ## License
 
diff --git a/docker-compose.yml b/docker-compose.yml
@@ -1,8 +1,7 @@
 services:
   mcts-api-server:
-    build:
-      context: .
-      dockerfile: Dockerfile
+    container_name: mcts-api-server
+    image: "ghcr.io/bearlike/mcts-openai-api:latest"
     ports:
       - "8336:8000" # Fast API Server
     environment:
diff --git a/docs/screenshot_1.png b/docs/screenshot_1.png
diff --git a/utils/classes.py b/utils/classes.py
@@ -1,9 +1,16 @@
 #!/usr/bin/env python3
 # Pydantic Models for Chat Completion API
+from enum import Enum
 from pydantic import BaseModel
 from typing import List, Optional
 
 
+class ReasoningEffort(Enum):
+    NORMAL = "normal"
+    MEDIUM = "medium"
+    HIGH = "high"
+
+
 class ChatMessage(BaseModel):
     role: str
     content: str
@@ -12,6 +19,7 @@ class ChatMessage(BaseModel):
 class ChatCompletionRequest(BaseModel):
     model: str  # e.g.: "gpt-4o-mini"
     messages: List[ChatMessage]
-    max_tokens: Optional[int] = 512
-    temperature: Optional[float] = 0.1
+    max_tokens: Optional[int] = None
+    temperature: Optional[float] = 0.7
     stream: Optional[bool] = False
+    reasoning_effort: Optional[ReasoningEffort] = ReasoningEffort.NORMAL
diff --git a/utils/llm/pipeline.py b/utils/llm/pipeline.py
@@ -11,6 +11,7 @@
 
 class Pipeline:
     """Pipeline: Wraps an incoming request into the MCTS process"""
+
     def __init__(self, *args, **kwargs):
         self.llm_client = LLMClient(*args, **kwargs)
 
@@ -42,6 +43,7 @@ async def run(
             llm_client=self.llm_client,
             question=question,
             event_emitter=emitter,
+            reasoning_effort=request_body.reasoning_effort,
             model=model,
         )
         final_answer = await mcts_agent.search()
diff --git a/utils/mcts.py b/utils/mcts.py