Skip to content

Commit f929211

Browse files
committed
Fix streaming and mermaid parsing
+ Update docs
1 parent f4bd3c0 commit f929211

File tree

5 files changed

+217
-95
lines changed

5 files changed

+217
-95
lines changed

.env.sample

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
OPENAI_API_BASE_URL=https://api.openai.com/v1
2+
OPENAI_API_KEY=sk-XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -159,3 +159,4 @@ cython_debug/
159159
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
160160
#.idea/
161161
.python-version
162+
output.md

README.md

Lines changed: 111 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,111 @@
1-
# MCTS OpenAI
2-
Every incoming request is wrapped with a Monte Carlo Tree Search (MCTS) pipeline
1+
# MCTS OpenAI API Wrapper
2+
3+
![Comparison of Response](docs/screenshot_1.png)
4+
5+
Monte Carlo Tree Search (MCTS) is a method that uses extra compute to explore different candidate responses before selecting a final answer. It works by building a tree of options and running multiple iterations. This is similar in concept to inference scaling, but here a model generates several output candidates, reitereates and picks the best one. Every incoming request is wrapped with a MCTS pipeline to iteratively refine language model outputs.
6+
7+
## Overview
8+
9+
This FastAPI server exposes two endpoints:
10+
11+
| Method | Endpoint | Description |
12+
|--------|------------------------|-------------------------------------------------------------------------------|
13+
| POST | `/v1/chat/completions` | Accepts chat completion requests. The call is wrapped with an MCTS refinement |
14+
| GET | `/v1/models` | Proxies a request to the underlying LLM provider’s models endpoint |
15+
16+
During a chat completion call, the server executes an MCTS pipeline that generates intermediate updates (including a Mermaid diagram and iteration details). All these intermediate responses are aggregated into a single `<details>` block, and the final answer is appended at the end, following a consistent and structured markdown template.
17+
18+
## Getting Started
19+
20+
### Prerequisites
21+
22+
- Python 3.8+
23+
- [Poetry](https://python-poetry.org) for dependency management
24+
25+
### Setup
26+
27+
1. **Clone the repository:**
28+
29+
```bash
30+
git clone https://github.com/bearlike/mcts-openai-api.git
31+
cd mcts-openai-api
32+
```
33+
34+
2. **Copy the Environment File:**
35+
36+
Copy the example environment file to `.env` and set your `OPENAI_API_KEY`:
37+
38+
```bash
39+
cp .env.example .env
40+
```
41+
42+
Open the `.env` file and update the `OPENAI_API_KEY` (and other settings if needed).
43+
44+
3. **Install Dependencies:**
45+
46+
Use Poetry to install the required packages:
47+
48+
```bash
49+
poetry install
50+
```
51+
52+
4. **Run the Server:**
53+
54+
Start the FastAPI server with Uvicorn:
55+
56+
```bash
57+
# Visit http://server-ip:8000/docs to view the Swagger API documentation
58+
uvicorn main:app --reload
59+
```
60+
61+
## Testing the Server
62+
63+
You can test the server using `curl` or any HTTP client.
64+
65+
### Example Request
66+
67+
```bash
68+
curl -X 'POST' \
69+
'http://192.168.1.198:8000/v1/chat/completions' \
70+
-H 'accept: application/json' \
71+
-H 'Content-Type: application/json' \
72+
-d '{
73+
"model": "gpt-4o-mini",
74+
"messages": [
75+
{
76+
"role": "user",
77+
"content": "How many R in STRAWBERRY?"
78+
}
79+
],
80+
"max_tokens": 1024,
81+
"temperature": 0.5
82+
}' | jq -r '.choices[0].message.content'
83+
```
84+
85+
This request will return a JSON response with the aggregated intermediate responses wrapped inside a single `<details>` block, followed by the final answer.
86+
87+
---
88+
89+
## Endpoints
90+
91+
### POST /v1/chat/completions
92+
93+
- **Description:**
94+
Wraps a chat completion request in an MCTS pipeline that refines the answer by generating intermediate updates and a final response.
95+
96+
- **Request Body Parameters:**
97+
98+
- `model`: string (e.g., `"gpt-4o-mini"`)
99+
- `messages`: an array of chat messages (with `role` and `content` properties)
100+
- `max_tokens`: (optional) number
101+
- `temperature`: (optional) number
102+
- `stream`: (optional) boolean (if enabled, aggregates intermediate responses with the final answer in one JSON response)
103+
104+
### GET /v1/models
105+
106+
- **Description:**
107+
Proxies requests to list available models from the underlying LLM provider using the `OPENAI_API_BASE_URL`.
108+
109+
## License
110+
111+
This project is licensed under the MIT License.

0 commit comments

Comments
 (0)