|
1 | 1 | # MCTS OpenAI API Wrapper |
2 | 2 |
|
3 | | - |
| 3 | +[](https://github.com/bearlike/mcts-openai-api/pkgs/container/mcts-openai-api) |
4 | 4 |
|
5 | | -Monte Carlo Tree Search (MCTS) is a method that uses extra compute to explore different candidate responses before selecting a final answer. It works by building a tree of options and running multiple iterations. This is similar in concept to inference scaling, but here a model generates several output candidates, reitereates and picks the best one. Every incoming request is wrapped with a MCTS pipeline to iteratively refine language model outputs. |
| 5 | +<p align="center"> |
| 6 | + <img src="docs/screenshot_1.png" alt="Comparison of Response" style="height: 512px;"> |
| 7 | +</p> |
| 8 | + |
| 9 | +Monte Carlo Tree Search (MCTS) is a heuristic search algorithm that systematically explores a tree of candidate outputs to refine language model responses. Upon receiving an input, the MCTS pipeline generates multiple candidate answers through iterative simulations. In each iteration, the algorithm evaluates and updates these candidates based on feedback, propagating the best scores upward. This process enhances inference by scaling the model's reasoning capabilities, enabling the selection of the optimal response from multiple candidates. |
6 | 10 |
|
7 | 11 | ## Overview |
8 | 12 |
|
9 | 13 | This FastAPI server exposes two endpoints: |
10 | 14 |
|
11 | 15 | | Method | Endpoint | Description | |
12 | | -|--------|------------------------|-------------------------------------------------------------------------------| |
| 16 | +| ------ | ---------------------- | ----------------------------------------------------------------------------- | |
13 | 17 | | POST | `/v1/chat/completions` | Accepts chat completion requests. The call is wrapped with an MCTS refinement | |
14 | | -| GET | `/v1/models` | Proxies a request to the underlying LLM provider’s models endpoint | |
| 18 | +| GET | `/v1/models` | Proxies a request to the underlying LLM provider’s models endpoint | |
15 | 19 |
|
16 | | -During a chat completion call, the server executes an MCTS pipeline that generates intermediate updates (including a Mermaid diagram and iteration details). All these intermediate responses are aggregated into a single `<details>` block, and the final answer is appended at the end, following a consistent and structured markdown template. |
| 20 | +During a chat completion call, the server runs an MCTS pipeline that produces iterative updates. Each update includes a dynamic Mermaid diagram and detailed logs of the iteration process. All intermediate responses are combined into a single `<details>` block. Finally, the final answer is appended at the end using a consistent, structured markdown template. |
17 | 21 |
|
18 | 22 | ## Getting Started |
19 | 23 |
|
20 | | -### Prerequisites |
| 24 | +### Deploy using Docker (Recommended) 🐳 |
| 25 | + |
| 26 | +1. Create a `secrets.env` with the variables from the `docker-compose.yml` file. |
| 27 | +2. Use this command to pull the image and deploy the application with Docker Compose: |
| 28 | + |
| 29 | + ```bash |
| 30 | + docker pull ghcr.io/bearlike/mcts-openai-api:latest |
| 31 | + docker compose --env-file secrets.env up -d |
| 32 | + |
| 33 | + # Go to http://hostname:8426/docs for Swagger API docs and test the endpoints. |
| 34 | + ``` |
| 35 | + |
| 36 | +3. Use `http://hostname:8426/v1` as the OpenAI Base URL with any API key in any compatible application. |
| 37 | + |
| 38 | +--- |
| 39 | + |
| 40 | +<details> |
| 41 | +<summary>Expand to view <code>Manual Installation</code></summary> |
21 | 42 |
|
22 | | -- Python 3.8+ |
| 43 | +### Manual Installation |
| 44 | + |
| 45 | +#### Prerequisites |
| 46 | + |
| 47 | +- Python 3.13+ |
23 | 48 | - [Poetry](https://python-poetry.org) for dependency management |
24 | 49 |
|
25 | | -### Setup |
| 50 | +#### Setup |
26 | 51 |
|
27 | 52 | 1. **Clone the repository:** |
28 | 53 |
|
@@ -54,10 +79,15 @@ During a chat completion call, the server executes an MCTS pipeline that generat |
54 | 79 | Start the FastAPI server with Uvicorn: |
55 | 80 |
|
56 | 81 | ```bash |
57 | | - # Visit http://server-ip:8000/docs to view the Swagger API documentation |
| 82 | + # Visit http://mcts-server:8000/docs to view the Swagger API documentation |
58 | 83 | uvicorn main:app --reload |
59 | 84 | ``` |
60 | 85 |
|
| 86 | + |
| 87 | +</details> |
| 88 | + |
| 89 | +--- |
| 90 | + |
61 | 91 | ## Testing the Server |
62 | 92 |
|
63 | 93 | You can test the server using `curl` or any HTTP client. |
@@ -88,23 +118,27 @@ This request will return a JSON response with the aggregated intermediate respon |
88 | 118 |
|
89 | 119 | ## Endpoints |
90 | 120 |
|
91 | | -### POST /v1/chat/completions |
| 121 | +### POST `/v1/chat/completions` |
92 | 122 |
|
93 | | -- **Description:** |
94 | | - Wraps a chat completion request in an MCTS pipeline that refines the answer by generating intermediate updates and a final response. |
| 123 | +Wraps a chat completion request in an MCTS pipeline that refines the answer by generating intermediate updates and a final response. |
95 | 124 |
|
96 | | -- **Request Body Parameters:** |
| 125 | +| Parameter | Data Type | Default | Description | |
| 126 | +| ---------------- | ------------------ | -------- | ----------------------------------------------------------------------------------------------------------- | |
| 127 | +| model | string (required) | N/A | e.g., `gpt-4o-mini`. | |
| 128 | +| messages | array (required) | N/A | Array of chat messages with `role` and `content`. | |
| 129 | +| max_tokens | number (optional) | N/A | Maximum tokens allowed in each step response. | |
| 130 | +| temperature | number (optional) | `0.7` | Controls the randomness of the output. | |
| 131 | +| stream | boolean (optional) | `false` | If false, aggregates streamed responses and returns on completion. If true, streams intermediate responses. | |
| 132 | +| reasoning_effort | string (optional) | `normal` | Controls the `MCTSAgent` search settings: | |
| 133 | +| => | => | => | **`normal`** - 2 iterations, 2 simulations per iteration, and 2 child nodes per parent (default). | |
| 134 | +| => | => | => | `medium` - 3 iterations, 3 simulations per iteration, and 3 child nodes per parent. | |
| 135 | +| => | => | => | `high` - 4 iterations, 4 simulations per iteration, and 4 child nodes per parent. | |
97 | 136 |
|
98 | | - - `model`: string (e.g., `"gpt-4o-mini"`) |
99 | | - - `messages`: an array of chat messages (with `role` and `content` properties) |
100 | | - - `max_tokens`: (optional) number |
101 | | - - `temperature`: (optional) number |
102 | | - - `stream`: (optional) boolean (if enabled, aggregates intermediate responses with the final answer in one JSON response) |
| 137 | +### GET `/v1/models` |
103 | 138 |
|
104 | | -### GET /v1/models |
| 139 | +Proxies requests to list available models from the underlying LLM provider using the `OPENAI_API_BASE_URL`. |
105 | 140 |
|
106 | | -- **Description:** |
107 | | - Proxies requests to list available models from the underlying LLM provider using the `OPENAI_API_BASE_URL`. |
| 141 | +--- |
108 | 142 |
|
109 | 143 | ## License |
110 | 144 |
|
|
0 commit comments