Skip to content

Commit c187d26

Browse files
OriNachumclaude
andauthored
docs: align documentation with code and fix inaccuracies (#54)
* docs: align documentation with code and fix inaccuracies Fix critical doc/code mismatches found during audit: - Implement LOG_LEVEL and LOG_FILE_PATH env vars (documented but missing) - Return HTTP 501 for non-streaming /responses requests (was silent) - Fix OPENAI_BASE_URL_INTERNAL default in README (was :11434, code is :8000) - Fix health endpoint response in architecture doc - Document 5 missing env vars, 3 MCP transport types, SSE heartbeat - Remove stale server.py/is_mcp_tool.py refs and 83 lines of dead code - Rewrite cli-local.md, update CLAUDE.md, expand .env.example Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: address PR review feedback - Use os.makedirs(exist_ok=True) to avoid race in multi-worker startup - Gracefully handle unwritable LOG_FILE_PATH (warn + continue with stderr) - Remove empty duplicate "Pydantic Models Reference" heading Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent 3211aa9 commit c187d26

File tree

11 files changed

+161
-181
lines changed

11 files changed

+161
-181
lines changed

.env.example

Lines changed: 10 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -7,10 +7,18 @@ OPENAI_API_KEY=sk-mockapikey123456789abcdefghijklmnopqrstuvwxyz
77
API_ADAPTER_HOST=0.0.0.0
88
API_ADAPTER_PORT=8080
99

10+
# MCP Configuration
11+
MCP_SERVERS_CONFIG_PATH=src/open_responses_server/servers_config.json
12+
MCP_TOOL_REFRESH_INTERVAL=10
13+
14+
# Conversation and Tool Handling
15+
MAX_CONVERSATION_HISTORY=100
16+
MAX_TOOL_CALL_ITERATIONS=25
17+
1018
# Streaming Configuration
1119
STREAM_TIMEOUT=120.0
1220
HEARTBEAT_INTERVAL=15.0
1321

14-
# Logging Configuration (optional)
22+
# Logging Configuration
1523
LOG_LEVEL=INFO
16-
LOG_FILE_PATH=./log/api_adapter.log
24+
LOG_FILE_PATH=./log/api_adapter.log

CLAUDE.md

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -73,7 +73,7 @@ api_controller.py -- FastAPI app with route definitions, CORS, startup/shutdown
7373
- `api_controller.py` - Route definitions. `server_entrypoint.py` is the uvicorn entry point that imports from `api_controller`.
7474
- `responses_service.py` - Converts Responses API requests to Chat Completions format (`convert_responses_to_chat_completions`), processes streaming Chat Completions responses back into Responses API SSE events (`process_chat_completions_stream`). Maintains in-memory `conversation_history` keyed by `previous_response_id`.
7575
- `chat_completions_service.py` - Handles `/v1/chat/completions` with MCP tool injection. Implements a tool-call loop (up to `MAX_TOOL_CALL_ITERATIONS`) for both streaming and non-streaming modes.
76-
- `common/mcp_manager.py` - `MCPManager` singleton manages MCP server lifecycle (stdio-based), tool discovery/caching with periodic refresh, and tool execution. `MCPServer` wraps individual server sessions.
76+
- `common/mcp_manager.py` - `MCPManager` singleton manages MCP server lifecycle (stdio, sse, streamable-http transports), tool discovery/caching with periodic refresh, and tool execution. `MCPServer` wraps individual server sessions.
7777
- `common/llm_client.py` - `LLMClient` singleton wrapping `httpx.AsyncClient`, pointed at `OPENAI_BASE_URL_INTERNAL`.
7878
- `common/config.py` - All configuration via environment variables (loaded from `.env` via python-dotenv). Key vars: `OPENAI_BASE_URL_INTERNAL`, `OPENAI_API_KEY`, `MCP_SERVERS_CONFIG_PATH`, `MAX_TOOL_CALL_ITERATIONS`.
7979
- `models/responses_models.py` - Pydantic models for Responses API request/response/streaming types.
@@ -82,7 +82,9 @@ api_controller.py -- FastAPI app with route definitions, CORS, startup/shutdown
8282

8383
## Configuration
8484

85-
All config is via environment variables (see `common/config.py`). The CLI command `otc configure` writes a `.env` file interactively. MCP servers are configured in a JSON file pointed to by `MCP_SERVERS_CONFIG_PATH` (default: `src/open_responses_server/servers_config.json`).
85+
All config is via environment variables (see `common/config.py`). The CLI command `otc configure` writes a `.env` file interactively. MCP servers are configured in a JSON file pointed to by `MCP_SERVERS_CONFIG_PATH` (default: `src/open_responses_server/servers_config.json`). Note: this default path assumes running from the repo root; when installed via pip, set it to an absolute path.
86+
87+
**Important:** The `/responses` endpoint only supports streaming (`stream=True`). Non-streaming requests return HTTP 501.
8688

8789
## Version & Releasing
8890

@@ -96,7 +98,7 @@ Version lives in `src/open_responses_server/version.py` as `__version__` — the
9698

9799
## CLI Entry Point
98100

99-
The `otc` command is defined in `pyproject.toml` pointing to `open_responses_server.cli:main`. Commands: `start`, `configure`, `help`.
101+
The `otc` command is defined in `pyproject.toml` pointing to `open_responses_server.cli:main`. Commands: `start`, `configure`, `help`. Also supports `--version` flag.
100102

101103
## PR Workflow
102104

README.md

Lines changed: 16 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -81,6 +81,7 @@ docker run -p 8080:8080 \
8181
ghcr.io/teabranch/open-responses-server:latest
8282
```
8383

84+
Docker images are available for linux/amd64, linux/arm64, and linux/arm/v7 architectures.
8485
Works great with docker-compose.yaml for Codex + your own model.
8586

8687
@@ -90,7 +91,7 @@ Works great with docker-compose.yaml for Codex + your own model.
9091
Minimal config to connect your AI backend:
9192

9293
```
93-
OPENAI_BASE_URL_INTERNAL=http://localhost:11434 # Ollama, vLLM, Groq, etc.
94+
OPENAI_BASE_URL_INTERNAL=http://localhost:8000 # Your LLM backend (Ollama typically on :11434, vLLM on :8000)
9495
OPENAI_BASE_URL=http://localhost:8080 # This server's endpoint
9596
OPENAI_API_KEY=sk-mockapikey123456789 # Mock key tunneled to backend
9697
MCP_SERVERS_CONFIG_PATH=./mcps.json # Path to mcps servers json file
@@ -101,10 +102,21 @@ Server binding:
101102
API_ADAPTER_HOST=0.0.0.0
102103
API_ADAPTER_PORT=8080
103104
```
104-
Optional logging:
105+
Streaming and connection:
105106
```
106-
LOG_LEVEL=INFO
107-
LOG_FILE_PATH=./log/api_adapter.log
107+
STREAM_TIMEOUT=120.0 # HTTP timeout (seconds) for streaming requests
108+
HEARTBEAT_INTERVAL=15.0 # SSE keepalive interval (seconds)
109+
```
110+
Conversation and tool handling:
111+
```
112+
MAX_CONVERSATION_HISTORY=100 # Max stored conversation entries
113+
MAX_TOOL_CALL_ITERATIONS=25 # Max tool-call loop iterations
114+
MCP_TOOL_REFRESH_INTERVAL=10 # Seconds between MCP tool cache refreshes
115+
```
116+
Logging:
117+
```
118+
LOG_LEVEL=INFO # DEBUG, INFO, WARNING, ERROR, CRITICAL
119+
LOG_FILE_PATH=./log/api_adapter.log # Path to log file
108120
```
109121

110122
Configure with CLI tool:

docs/cli-local.md

Lines changed: 52 additions & 44 deletions
Original file line numberDiff line numberDiff line change
@@ -3,47 +3,55 @@ title: CLI Usage
33
nav_order: 5
44
---
55

6-
# CLI Usage
7-
8-
To run the `cli.py` script and use it to manage the `server.py`, follow these steps:
9-
10-
1. **Install uv and dependences**
11-
Assumed you installed dependencies already.
12-
13-
2. **Run the CLI Script**:
14-
You can execute the `cli.py` script directly using Python. For example:
15-
```bash
16-
uv run src/open_responses_server/cli.py <command>
17-
```
18-
Replace `<command>` with one of the available commands (`start`, `configure`, or `help`).
19-
20-
3. **Available Commands**:
21-
- `start`: Starts the FastAPI server defined in `server.py`.
22-
- `configure`: Allows you to configure server settings like host, port, API URLs, and API key.
23-
- `help`: Displays help information about the CLI.
24-
25-
4. **Example Usage**:
26-
- To start the server:
27-
```bash
28-
python src/open_responses_server/cli.py start
29-
```
30-
- To configure the server:
31-
```bash
32-
python src/open_responses_server/cli.py configure
33-
```
34-
- To display help:
35-
```bash
36-
python src/open_responses_server/cli.py help
37-
```
38-
39-
5. **Make the Script Executable (Optional)**:
40-
If you want to run the script without explicitly calling Python, you can make it executable:
41-
```bash
42-
chmod +x src/open_responses_server/cli.py
43-
```
44-
Then, run it directly:
45-
```bash
46-
./src/open_responses_server/cli.py <command>
47-
```
48-
49-
Let me know if you need further assistance!
6+
## Overview
7+
8+
The `otc` command is the CLI entry point for Open Responses Server, defined in
9+
`pyproject.toml` pointing to `open_responses_server.cli:main`.
10+
11+
## Commands
12+
13+
| Command | Description |
14+
| --- | --- |
15+
| `otc start` | Start the FastAPI server |
16+
| `otc configure` | Interactive configuration wizard (saves to `.env`) |
17+
| `otc help` | Display help information |
18+
| `otc --version` | Show version information |
19+
20+
## Running after installation
21+
22+
```bash
23+
# After pip install or uv pip install
24+
otc start
25+
otc configure
26+
otc --version
27+
```
28+
29+
## Running from source
30+
31+
```bash
32+
# Using uv
33+
uv run src/open_responses_server/cli.py start
34+
35+
# Or directly with Python (venv must be activated)
36+
python src/open_responses_server/cli.py start
37+
```
38+
39+
## Start command
40+
41+
Starts the FastAPI server via uvicorn. The server binds to the host and port
42+
defined by `API_ADAPTER_HOST` and `API_ADAPTER_PORT` environment variables
43+
(defaults: `0.0.0.0:8080`).
44+
45+
```bash
46+
otc start
47+
```
48+
49+
## Configure command
50+
51+
Interactive wizard that prompts for host, port, backend URL, external URL, and
52+
API key. Saves the configuration to a `.env` file in the current directory,
53+
merging with any existing values.
54+
55+
```bash
56+
otc configure
57+
```

docs/events-and-tool-handling.md

Lines changed: 13 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -275,7 +275,19 @@ When processing Responses API input with `function_call_output` items
275275
containing the tool name and arguments, then adds the tool response.
276276
This handles resuming from external tool execution.
277277

278-
## Pydantic Models Reference
278+
## Connection Keepalive (Heartbeat)
279+
280+
When the backend LLM is slow to respond, the server sends SSE comment lines
281+
(`: heartbeat\n\n`) at the interval configured by `HEARTBEAT_INTERVAL`
282+
(default: 15 seconds). This prevents proxies and load balancers from closing
283+
idle connections.
284+
285+
Heartbeats are standard SSE comments and should be ignored by compliant clients.
286+
The mechanism is implemented by `_with_heartbeat()` in `api_controller.py`,
287+
which wraps the response stream and injects heartbeat sentinels during idle
288+
periods.
289+
290+
## Pydantic Models
279291

280292
Defined in `src/open_responses_server/models/responses_models.py`.
281293

docs/open-responses-server.md

Lines changed: 25 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -27,8 +27,6 @@ a tool-call execution loop, plus a generic proxy for all other endpoints.
2727
| `server_entrypoint.py` | Uvicorn entry point (imports `app` from `api_controller`) |
2828
| `cli.py` | `otc` CLI: `start`, `configure`, `help` commands |
2929
| `version.py` | `__version__` string, read dynamically by setuptools |
30-
| `server.py` | Legacy duplicate of api_controller (not imported by active code) |
31-
| `is_mcp_tool.py` | Standalone utility, superseded by `MCPManager.is_mcp_tool()` |
3230

3331
## Request Routing
3432

@@ -50,7 +48,7 @@ Client
5048
│ → Tool-call loop (up to MAX_TOOL_CALL_ITERATIONS)
5149
│ → Final response streamed or returned as JSON
5250
53-
├─ GET /health → {"status": "ok"}
51+
├─ GET /health → {"status": "ok", "adapter": "running"}
5452
├─ GET / → {"message": "Open Responses Server is running."}
5553
5654
└─ GET/POST /{path} (catch-all proxy)
@@ -71,28 +69,45 @@ All configuration is via environment variables, loaded from `.env` via
7169
| `API_ADAPTER_HOST` | `0.0.0.0` | Server bind address |
7270
| `API_ADAPTER_PORT` | `8080` | Server port |
7371
| `MCP_TOOL_REFRESH_INTERVAL` | `10` | Seconds between MCP tool cache refreshes |
74-
| `MCP_SERVERS_CONFIG_PATH` | `src/open_responses_server/servers_config.json` | Path to MCP servers JSON config |
72+
| `MCP_SERVERS_CONFIG_PATH` | `src/open_responses_server/servers_config.json` | Path to MCP servers JSON config (use absolute path when pip-installed) |
7573
| `MAX_CONVERSATION_HISTORY` | `100` | Max stored conversation entries |
7674
| `MAX_TOOL_CALL_ITERATIONS` | `25` | Max tool-call loop iterations |
75+
| `STREAM_TIMEOUT` | `120.0` | HTTP timeout (seconds) for streaming requests |
76+
| `HEARTBEAT_INTERVAL` | `15.0` | SSE keepalive interval (seconds) |
77+
| `LOG_LEVEL` | `INFO` | Logging level (DEBUG, INFO, WARNING, ERROR, CRITICAL) |
78+
| `LOG_FILE_PATH` | `./log/api_adapter.log` | Path to log file |
7779

7880
### MCP Server Configuration
7981

80-
The JSON file at `MCP_SERVERS_CONFIG_PATH` defines MCP servers:
82+
The JSON file at `MCP_SERVERS_CONFIG_PATH` defines MCP servers. Three transport
83+
types are supported: `stdio` (default), `sse`, and `streamable-http`.
8184

8285
```json
8386
{
8487
"mcpServers": {
85-
"server-name": {
86-
"command": "executable",
87-
"args": ["arg1", "arg2"],
88+
"stdio-server": {
89+
"type": "stdio",
90+
"command": "npx",
91+
"args": ["-y", "@modelcontextprotocol/server-filesystem", "/tmp"],
8892
"env": {"KEY": "value"}
93+
},
94+
"sse-server": {
95+
"type": "sse",
96+
"url": "http://example.com/sse",
97+
"headers": {"Authorization": "Bearer token"}
98+
},
99+
"http-server": {
100+
"type": "streamable-http",
101+
"url": "http://example.com/mcp",
102+
"headers": {"Authorization": "Bearer token"}
89103
}
90104
}
91105
}
92106
```
93107

94-
Each server is started as a subprocess via `stdio_client` from the `mcp`
95-
library.
108+
The `type` field defaults to `stdio` if omitted. Stdio servers use `command`,
109+
`args`, and `env` fields. SSE and streamable-http servers use `url` and optional
110+
`headers` fields.
96111

97112
## Startup / Shutdown Lifecycle
98113

src/open_responses_server/api_controller.py

Lines changed: 7 additions & 86 deletions
Original file line numberDiff line numberDiff line change
@@ -326,99 +326,20 @@ async def stream_response():
326326

327327
else:
328328
logger.info("Non-streaming response unsupported")
329-
329+
raise HTTPException(
330+
status_code=501,
331+
detail="Non-streaming responses are not supported on /responses. Set stream=True."
332+
)
333+
334+
except HTTPException:
335+
raise
330336
except Exception as e:
331337
logger.error(f"Error in create_response: {str(e)}")
332338
raise HTTPException(
333339
status_code=500,
334340
detail=f"Error processing request: {str(e)}"
335341
)
336342

337-
# @app.post("/responses")
338-
#async def create_response(request: Request):
339-
# """
340-
# Endpoint for the custom /responses API.
341-
# Converts the request, calls the chat completions endpoint, and streams the converted response.
342-
# """
343-
# try:
344-
# request_data = await request.json()
345-
#
346-
# # Log basic request information
347-
# logger.info(f"Received request: model={request_data.get('model')}, stream={request_data.get('stream')}")
348-
#
349-
# # Log input content for better visibility
350-
# if "input" in request_data and request_data["input"]:
351-
# logger.info("==== REQUEST CONTENT ====")
352-
# for i, item in enumerate(request_data["input"]):
353-
# if isinstance(item, dict):
354-
# if item.get("type") == "message" and item.get("role") == "user":
355-
# if "content" in item and isinstance(item["content"], list):
356-
# for index, content_item in enumerate(item["content"]):
357-
# if isinstance(content_item, dict):
358-
# Handle nested content structure like {"type": "input_text", "text": "actual message"}
359-
# if content_item.get("type") == "input_text" and "text" in content_item:
360-
# user_text = content_item.get("text", "")
361-
# logger.info(f"USER INPUT: {user_text}")
362-
# elif content_item.get("type") == "text" and "text" in content_item:
363-
# user_text = content_item.get("text", "")
364-
# logger.info(f"USER INPUT: {user_text}")
365-
# # Handle other content types
366-
# elif "type" in content_item:
367-
# logger.info(f"USER INPUT ({content_item.get('type')}): {str(content_item)[:100]}...")
368-
# elif isinstance(content_item, str):
369-
# logger.info(f"USER INPUT: {content_item}")
370-
# elif item.get("type") == "function_call_output":
371-
# logger.info(f"FUNCTION RESULT: call_id={item.get('call_id')}, output={str(item.get('output', ''))[:100]}...")
372-
# elif isinstance(item, str):
373-
# logger.info(f"USER INPUT: {item}")
374-
# logger.info("=======================")
375-
376-
# # Inject MCP tools into the request before conversion
377-
# mcp_tools = mcp_manager.get_mcp_tools()
378-
# if mcp_tools:
379-
# # Start with user-provided tools, or an empty list
380-
# final_tools = request_data.get("tools", [])
381-
382-
# # Get the names of the tools already in the list
383-
# final_tool_names = {
384-
# tool.get("function", {}).get("name") if tool.get("function") else tool.get("name")
385-
# for tool in final_tools
386-
# if (tool.get("function") and tool.get("function").get("name")) or tool.get("name")
387-
# }
388-
389-
# # Add only the new MCP tools that don't conflict
390-
# for tool in mcp_tools:
391-
# if tool.get("name") not in final_tool_names:
392-
# final_tools.append({"type": "function", "function": tool})
393-
394-
# request_data["tools"] = final_tools
395-
# logger.info(f"Injected {len(mcp_tools)} MCP tools into request")
396-
397-
# chat_request = convert_responses_to_chat_completions(request_data)
398-
399-
# client = await LLMClient.get_client()
400-
401-
# async def stream_response():
402-
# try:
403-
# async with client.stream("POST", "/v1/chat/completions", json=chat_request, timeout=STREAM_TIMEOUT) as response:
404-
# if response.status_code != 200:
405-
# error_content = await response.aread()
406-
# logger.error(f"Error from LLM API: {error_content.decode()}")
407-
# yield f"data: {json.dumps({'error': 'LLM API Error'})}\n\n"
408-
# return
409-
410-
# async for event in process_chat_completions_stream(response, chat_request):
411-
# yield event
412-
# except Exception as e:
413-
# logger.error(f"Error in /responses stream: {e}")
414-
# yield f"data: {json.dumps({'error': str(e)})}\n\n"
415-
416-
# return StreamingResponse(stream_response(), media_type="text/event-stream")
417-
418-
# except Exception as e:
419-
# logger.error(f"Error in create_response endpoint: {e}")
420-
# raise HTTPException(status_code=500, detail=str(e))
421-
422343

423344
@app.post("/v1/chat/completions")
424345
async def chat_completions(request: Request):

0 commit comments

Comments
 (0)