Skip to content

Commit b2e421d

Browse files
committed
Add Reasoning Effort to MCTS Wrapper (#1)
* Add reasoning effort to response model * Configure MCTS Parameters by Reasoning Effort Set iteration, simulation, and child limits in MCTSAgent per reasoning effort: NORMAL (2-2-2), MEDIUM (3-3-3), and HIGH (4-4-4). This ensures that each query goes through the minimum required iterations even when the initial score is high. Update Dockerfile * Update Docs * Update prompts Update screenshot_1.png Update README.md * Add workflow to build Fix: remove unnecessary packages
1 parent 1377cce commit b2e421d

File tree

9 files changed

+196
-94
lines changed

9 files changed

+196
-94
lines changed
Lines changed: 62 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,62 @@
1+
name: Build and Push Docker Image
2+
3+
on:
4+
release:
5+
types: [published]
6+
workflow_dispatch:
7+
inputs:
8+
branch:
9+
description: "Branch to build"
10+
required: true
11+
default: "main"
12+
tag:
13+
description: "Tag for the Docker image"
14+
required: true
15+
default: "latest"
16+
17+
jobs:
18+
build:
19+
runs-on: ubuntu-latest
20+
steps:
21+
- name: Determine checkout reference
22+
id: checkout-ref
23+
run: |
24+
if [ "${{ github.event_name }}" = "workflow_dispatch" ]; then
25+
echo "ref=${{ github.event.inputs.branch }}" >> $GITHUB_OUTPUT
26+
else
27+
echo "ref=${{ github.event.release.target_commitish }}" >> $GITHUB_OUTPUT
28+
fi
29+
30+
- name: Checkout code
31+
uses: actions/checkout@v4
32+
with:
33+
ref: ${{ steps.checkout-ref.outputs.ref }}
34+
35+
- name: Set up Docker Buildx
36+
uses: docker/setup-buildx-action@v3
37+
38+
- name: Log in to GitHub Container Registry
39+
uses: docker/login-action@v2
40+
with:
41+
registry: ghcr.io
42+
username: ${{ github.actor }}
43+
password: ${{ secrets.GITHUB_TOKEN }}
44+
45+
- name: Set up Docker image tags
46+
id: prep
47+
run: |
48+
if [ "${{ github.event_name }}" = "release" ]; then
49+
echo "image_tag=${{ github.event.release.tag_name }}" >> $GITHUB_OUTPUT
50+
else
51+
echo "image_tag=${{ github.event.inputs.tag }}" >> $GITHUB_OUTPUT
52+
fi
53+
54+
- name: Build and push Docker image
55+
uses: docker/build-push-action@v6
56+
with:
57+
context: .
58+
push: true
59+
tags: ghcr.io/${{ github.repository_owner }}/mcts-openai-api:${{ steps.prep.outputs.image_tag }}
60+
cache-from: type=gha
61+
cache-to: type=gha,mode=max
62+
platforms: linux/amd64,linux/arm64

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -160,3 +160,4 @@ cython_debug/
160160
#.idea/
161161
.python-version
162162
output.md
163+
test.sh

Dockerfile

Lines changed: 5 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -1,48 +1,23 @@
11
FROM python:3.13-slim
22

3+
ARG TARGETPLATFORM=linux/amd64
4+
ARG DEBIAN_FRONTEND=noninteractive
5+
ARG LANG=C.UTF-8
6+
37
# Install system dependencies
48
RUN apt-get update && apt-get install -y \
59
wget \
610
netcat-traditional \
711
gnupg \
812
curl \
913
unzip \
10-
xvfb \
11-
libgconf-2-4 \
12-
libxss1 \
13-
libnss3 \
14-
libnspr4 \
15-
libasound2 \
16-
libatk1.0-0 \
17-
libatk-bridge2.0-0 \
18-
libcups2 \
19-
libdbus-1-3 \
20-
libdrm2 \
21-
libgbm1 \
22-
libgtk-3-0 \
23-
libxcomposite1 \
24-
libxdamage1 \
25-
libxfixes3 \
26-
libxrandr2 \
27-
xdg-utils \
28-
fonts-liberation \
29-
dbus \
30-
xauth \
31-
xvfb \
32-
x11vnc \
33-
tigervnc-tools \
3414
supervisor \
3515
net-tools \
3616
procps \
3717
git \
38-
python3-numpy \
39-
fontconfig \
40-
fonts-dejavu \
41-
fonts-dejavu-core \
42-
fonts-dejavu-extra
18+
&& rm -rf /var/lib/apt/lists/*
4319

4420
# Set platform for ARM64 compatibility
45-
ARG TARGETPLATFORM=linux/amd64
4621
ENV OPENAI_BASE_URL="https://api.openai.com/v1"
4722
ENV OPENAI_API_KEY="sk-XXX"
4823

@@ -54,7 +29,6 @@ COPY scripts/requirements.txt .
5429
RUN pip install --no-cache-dir -r requirements.txt
5530

5631
COPY . .
57-
RUN rm -rf /var/lib/apt/lists/*
5832

5933
# Set up supervisor configuration
6034
RUN mkdir -p /var/log/supervisor

README.md

Lines changed: 55 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -1,28 +1,53 @@
11
# MCTS OpenAI API Wrapper
22

3-
![Comparison of Response](docs/screenshot_1.png)
3+
[![Packages](https://img.shields.io/badge/Docker-ghcr.io%2Fbearlike%2Fmcts%E2%80%94openai%E2%80%94api%3Alatest-blue?logo=docker)](https://github.com/bearlike/mcts-openai-api/pkgs/container/mcts-openai-api)
44

5-
Monte Carlo Tree Search (MCTS) is a method that uses extra compute to explore different candidate responses before selecting a final answer. It works by building a tree of options and running multiple iterations. This is similar in concept to inference scaling, but here a model generates several output candidates, reitereates and picks the best one. Every incoming request is wrapped with a MCTS pipeline to iteratively refine language model outputs.
5+
<p align="center">
6+
<img src="docs/screenshot_1.png" alt="Comparison of Response" style="height: 512px;">
7+
</p>
8+
9+
Monte Carlo Tree Search (MCTS) is a heuristic search algorithm that systematically explores a tree of candidate outputs to refine language model responses. Upon receiving an input, the MCTS pipeline generates multiple candidate answers through iterative simulations. In each iteration, the algorithm evaluates and updates these candidates based on feedback, propagating the best scores upward. This process enhances inference by scaling the model's reasoning capabilities, enabling the selection of the optimal response from multiple candidates.
610

711
## Overview
812

913
This FastAPI server exposes two endpoints:
1014

1115
| Method | Endpoint | Description |
12-
|--------|------------------------|-------------------------------------------------------------------------------|
16+
| ------ | ---------------------- | ----------------------------------------------------------------------------- |
1317
| POST | `/v1/chat/completions` | Accepts chat completion requests. The call is wrapped with an MCTS refinement |
14-
| GET | `/v1/models` | Proxies a request to the underlying LLM provider’s models endpoint |
18+
| GET | `/v1/models` | Proxies a request to the underlying LLM provider’s models endpoint |
1519

16-
During a chat completion call, the server executes an MCTS pipeline that generates intermediate updates (including a Mermaid diagram and iteration details). All these intermediate responses are aggregated into a single `<details>` block, and the final answer is appended at the end, following a consistent and structured markdown template.
20+
During a chat completion call, the server runs an MCTS pipeline that produces iterative updates. Each update includes a dynamic Mermaid diagram and detailed logs of the iteration process. All intermediate responses are combined into a single `<details>` block. Finally, the final answer is appended at the end using a consistent, structured markdown template.
1721

1822
## Getting Started
1923

20-
### Prerequisites
24+
### Deploy using Docker (Recommended) 🐳
25+
26+
1. Create a `secrets.env` with the variables from the `docker-compose.yml` file.
27+
2. Use this command to pull the image and deploy the application with Docker Compose:
28+
29+
```bash
30+
docker pull ghcr.io/bearlike/mcts-openai-api:latest
31+
docker compose --env-file secrets.env up -d
32+
33+
# Go to http://hostname:8426/docs for Swagger API docs and test the endpoints.
34+
```
35+
36+
3. Use `http://hostname:8426/v1` as the OpenAI Base URL with any API key in any compatible application.
37+
38+
---
39+
40+
<details>
41+
<summary>Expand to view <code>Manual Installation</code></summary>
2142

22-
- Python 3.8+
43+
### Manual Installation
44+
45+
#### Prerequisites
46+
47+
- Python 3.13+
2348
- [Poetry](https://python-poetry.org) for dependency management
2449

25-
### Setup
50+
#### Setup
2651

2752
1. **Clone the repository:**
2853

@@ -54,10 +79,15 @@ During a chat completion call, the server executes an MCTS pipeline that generat
5479
Start the FastAPI server with Uvicorn:
5580

5681
```bash
57-
# Visit http://server-ip:8000/docs to view the Swagger API documentation
82+
# Visit http://mcts-server:8000/docs to view the Swagger API documentation
5883
uvicorn main:app --reload
5984
```
6085

86+
87+
</details>
88+
89+
---
90+
6191
## Testing the Server
6292

6393
You can test the server using `curl` or any HTTP client.
@@ -88,23 +118,27 @@ This request will return a JSON response with the aggregated intermediate respon
88118

89119
## Endpoints
90120

91-
### POST /v1/chat/completions
121+
### POST `/v1/chat/completions`
92122

93-
- **Description:**
94-
Wraps a chat completion request in an MCTS pipeline that refines the answer by generating intermediate updates and a final response.
123+
Wraps a chat completion request in an MCTS pipeline that refines the answer by generating intermediate updates and a final response.
95124

96-
- **Request Body Parameters:**
125+
| Parameter | Data Type | Default | Description |
126+
| ---------------- | ------------------ | -------- | ----------------------------------------------------------------------------------------------------------- |
127+
| model | string (required) | N/A | e.g., `gpt-4o-mini`. |
128+
| messages | array (required) | N/A | Array of chat messages with `role` and `content`. |
129+
| max_tokens | number (optional) | N/A | Maximum tokens allowed in each step response. |
130+
| temperature | number (optional) | `0.7` | Controls the randomness of the output. |
131+
| stream | boolean (optional) | `false` | If false, aggregates streamed responses and returns on completion. If true, streams intermediate responses. |
132+
| reasoning_effort | string (optional) | `normal` | Controls the `MCTSAgent` search settings: |
133+
| => | => | => | **`normal`** - 2 iterations, 2 simulations per iteration, and 2 child nodes per parent (default). |
134+
| => | => | => | `medium` - 3 iterations, 3 simulations per iteration, and 3 child nodes per parent. |
135+
| => | => | => | `high` - 4 iterations, 4 simulations per iteration, and 4 child nodes per parent. |
97136

98-
- `model`: string (e.g., `"gpt-4o-mini"`)
99-
- `messages`: an array of chat messages (with `role` and `content` properties)
100-
- `max_tokens`: (optional) number
101-
- `temperature`: (optional) number
102-
- `stream`: (optional) boolean (if enabled, aggregates intermediate responses with the final answer in one JSON response)
137+
### GET `/v1/models`
103138

104-
### GET /v1/models
139+
Proxies requests to list available models from the underlying LLM provider using the `OPENAI_API_BASE_URL`.
105140

106-
- **Description:**
107-
Proxies requests to list available models from the underlying LLM provider using the `OPENAI_API_BASE_URL`.
141+
---
108142

109143
## License
110144

docker-compose.yml

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,7 @@
11
services:
22
mcts-api-server:
3-
build:
4-
context: .
5-
dockerfile: Dockerfile
3+
container_name: mcts-api-server
4+
image: "ghcr.io/bearlike/mcts-openai-api:latest"
65
ports:
76
- "8336:8000" # Fast API Server
87
environment:

docs/screenshot_1.png

-212 KB
Loading

utils/classes.py

Lines changed: 10 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,16 @@
11
#!/usr/bin/env python3
22
# Pydantic Models for Chat Completion API
3+
from enum import Enum
34
from pydantic import BaseModel
45
from typing import List, Optional
56

67

8+
class ReasoningEffort(Enum):
9+
NORMAL = "normal"
10+
MEDIUM = "medium"
11+
HIGH = "high"
12+
13+
714
class ChatMessage(BaseModel):
815
role: str
916
content: str
@@ -12,6 +19,7 @@ class ChatMessage(BaseModel):
1219
class ChatCompletionRequest(BaseModel):
1320
model: str # e.g.: "gpt-4o-mini"
1421
messages: List[ChatMessage]
15-
max_tokens: Optional[int] = 512
16-
temperature: Optional[float] = 0.1
22+
max_tokens: Optional[int] = None
23+
temperature: Optional[float] = 0.7
1724
stream: Optional[bool] = False
25+
reasoning_effort: Optional[ReasoningEffort] = ReasoningEffort.NORMAL

utils/llm/pipeline.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@
1111

1212
class Pipeline:
1313
"""Pipeline: Wraps an incoming request into the MCTS process"""
14+
1415
def __init__(self, *args, **kwargs):
1516
self.llm_client = LLMClient(*args, **kwargs)
1617

@@ -42,6 +43,7 @@ async def run(
4243
llm_client=self.llm_client,
4344
question=question,
4445
event_emitter=emitter,
46+
reasoning_effort=request_body.reasoning_effort,
4547
model=model,
4648
)
4749
final_answer = await mcts_agent.search()

0 commit comments

Comments
 (0)