Skip to content

feat: add custom headers support and fix response/error handling#23

Merged
Nayjest merged 27 commits intoNayjest:mainfrom
manascb1344:feature/custom-headers
Feb 5, 2026
Merged

feat: add custom headers support and fix response/error handling#23
Nayjest merged 27 commits intoNayjest:mainfrom
manascb1344:feature/custom-headers

Conversation

@manascb1344
Copy link
Contributor

@manascb1344 manascb1344 commented Jan 17, 2026

PR Description

Summary

This PR adds custom headers support for proxy connections and fixes multiple bugs in response and error handling that caused 500 Internal Server Errors.

Issues Addressed


Changes Made

Feature: Custom Headers Support

Added the ability to send custom HTTP headers to upstream LLM providers.

1. Header Utilities (lm_proxy/utils.py)

Added utility functions for header management:

  • SENSITIVE_HEADERS - Set of headers that should be blocked for security
  • filter_sensitive_headers() - Removes sensitive headers from being overridden
  • merge_headers() - Merges base headers with override headers (override takes precedence)

2. Header Injection (lm_proxy/bootstrap.py)

  • Added create_llm_wrapper() function that wraps LLM calls to inject custom headers
  • Headers are injected via the extra_headers parameter passed to the OpenAI SDK
  • Config-level headers merged with request-level headers (request takes precedence)
  • Sensitive headers are automatically filtered

3. Usage

[connections.openai]
api_type = "open_ai"
api_base = "https://api.openai.com/v1"
api_key = "env:OPENAI_API_KEY"

[connections.openai.extra_headers]
X-Title = "MyApp"
X-Custom-Header = "value"

Bug Fixes: Response and Error Handling

Fixed multiple bugs in how responses and errors are handled.

Bug #1: Non-Streaming Response Reconstruction Failure

Problem: Non-streaming responses were incorrectly reconstructed using str(out), causing AttributeError: 'str' object has no attribute 'choices'.

Fix: Forward responses as-is by checking for model_dump(), dict(), or raw dict:

if hasattr(out, "model_dump"):
    return JSONResponse(out.model_dump())
elif hasattr(out, "dict"):
    return JSONResponse(out.dict())
elif isinstance(out, dict):
    return JSONResponse(out)
else:
    # Fallback for edge cases
    return JSONResponse({...})

Bug #2: Invalid Upstream Responses Cause 500 Instead of 502

Problem: When upstream providers returned non-JSON responses (HTML, 503 errors), the proxy crashed with 500 Internal Server Error.

Fix: Catch upstream errors and return 502 Bad Gateway with clear error messages:

try:
    out = await async_llm_func(request.messages, **llm_params)
except (AttributeError, ValueError) as e:
    return JSONResponse(
        {"error": {"message": f"Upstream provider error: {e}", "type": "upstream_error"}},
        status_code=502
    )

Bug #3: Poor Error Logging

Problem: Error logs didn't provide enough information for debugging.

Fix: Enhanced logging with descriptive messages:

logging.error(
    "Upstream provider returned non-JSON response. "
    "Content-Type was likely 'text/html' instead of 'application/json'. "
    "This is a provider issue, not a proxy issue. "
    "Original error: %s",
    e
)

Bug #4: Microcore dict() Serialization Edge Case

Problem: Microcore's LLMResponse.dict() sometimes returns malformed data.

Fix: Added try/catch with fallback:

try:
    dict_out = out.dict()
    if isinstance(dict_out, dict):
        return JSONResponse(dict_out)
except (ValueError, TypeError, AttributeError) as e:
    logging.debug("Failed to serialize out.dict(): %s", e)

Files Changed

File Changes
lm_proxy/utils.py +79 lines - Header utilities
lm_proxy/bootstrap.py +94 lines - Header injection wrapper + error handling
lm_proxy/core.py +37 lines - Response handling + error handling
tests/test_extra_headers.py +283 lines - 20 comprehensive tests

Testing

# Run new tests
uv run pytest tests/test_extra_headers.py -v  # 20 tests pass

# Run all tests
uv run pytest tests/ -v  # All pass (except pre-existing integration failures)

Breaking Changes

None. All changes are backward-compatible.


Security

Sensitive headers are automatically blocked:

  • Authorization, WWW-Authenticate
  • Content-Length, Content-Type, Transfer-Encoding
  • Host, Connection, Cache-Control
  • Proxy-Authorization, Strict-Transport-Security
  • HTTP/2 pseudo-headers (:method, :path, :scheme, :authority)

- Add extra_headers config option for connection definitions
- Add header injection wrapper for LLM function calls
- Add security filtering for sensitive headers (Authorization, Host, etc.)
- Add comprehensive tests for header utilities and wrapper

Example usage:
[connections.openai]
api_type = "open_ai"
api_key = "env:OPENAI_API_KEY"
extra_headers = { X-Title = "MyApp", X-Custom = "value" }
The non-streaming response was incorrectly reconstructed as a string,
causing 500 errors when provider returns proper JSON response.
Now checks for dict/model and forwards as-is, with fallback to
reconstructed format for backwards compatibility.
- Add error handling in wrapper to catch microcore parsing failures
- Return 502 Bad Gateway when upstream returns non-JSON (e.g., HTML)
- Provide clear error message instead of crashing with 500
@manascb1344
Copy link
Contributor Author

Hey @Nayjest

This PR addresses both #21 and #22.

Testing:

Automated:

  • 36 tests pass (20 new + 16 existing)
  • No breaking changes

Manual:

  • Custom headers injected correctly (X-Title, User-Agent, etc.)
  • Non-streaming requests return proper JSON
  • Streaming requests work
  • Invalid upstream responses return 502 (not 500)
  • Error messages are clear

Changes are thoroughly tested. Please review when you have time.

Thanks!

@Nayjest Nayjest self-requested a review January 29, 2026 10:20
@Nayjest
Copy link
Owner

Nayjest commented Jan 29, 2026

@manascb1344 Thanks for the great contribution!

I'm planning to review everything and include the necessary changes and tests in the next major release (v3), expected in February.

Questions to investigate:

-=[ 1 ]=- Theoretically, custom headers should work using the default_args parameter as follows:

[connections.openai]
api_type = "open_ai"
api_base = "https://api.openai.com/v1"
api_key = "env:OPENAI_API_KEY"
[connections.openai.default_args.extra_headers]
X-Title = "MyApp"
X-Custom-Header = "value"

See https://github.com/Nayjest/ai-microcore/blob/main/microcore/configuration.py#L213default_args from the configuration is passed as kwargs to all LLM function calls, so extra_headers supported by the OpenAI SDK could be placed there.

That said, the tests, security checks, and logging improvements look very useful.

-=[ 2 ]=- I will likely move some of these changes to the LLM API adapter level (microcore) as a more appropriate place for such generic functionality.

@Nayjest
Copy link
Owner

Nayjest commented Jan 29, 2026

-=[ 3 ]=- Moving extra_headers to the root level config is anyway worthwhile for unifying functionality across various HTTP APIs like Anthropic and Google.

@Nayjest Nayjest added the enhancement New feature or request label Jan 29, 2026
@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@Nayjest
Copy link
Owner

Nayjest commented Jan 30, 2026

@manascb1344 @gito

Why not handle all exceptions same way like it was introduced for AttributeError, ValueError ?

    try:
        out = await async_llm_func(request.messages, **llm_params)
        log_entry.response = out
        logging.info("LLM response: %s", out)
    except (AttributeError, ValueError) as e:
        # Handle case where upstream returns non-JSON response (e.g., HTML error page)
        # microcore fails to parse and returns a string, causing AttributeError
        # We catch this and return a proper error response
        error_msg = str(e)
        logging.error("Upstream provider error: %s", error_msg)
        log_entry.error = e
        await log_non_blocking(log_entry)
        return JSONResponse(
            {
                "error": {
                    "message": f"Upstream provider returned invalid response: {error_msg}",
                    "type": "upstream_error",
                }
            },
            status_code=502,
        )
    except Exception as e:
        log_entry.error = e
        await log_non_blocking(log_entry)
        raise
    await log_non_blocking(log_entry)

@manascb1344
Copy link
Contributor Author

We can do that!

@github-actions

This comment has been minimized.

Repository owner deleted a comment from github-actions bot Jan 30, 2026
@Nayjest
Copy link
Owner

Nayjest commented Jan 30, 2026

@gito

Why not handle all exceptions same way like it was introduced for AttributeError, ValueError ?

    try:
        out = await async_llm_func(request.messages, **llm_params)
        log_entry.response = out
        logging.info("LLM response: %s", out)
    except (AttributeError, ValueError) as e:
        # Handle case where upstream returns non-JSON response (e.g., HTML error page)
        # microcore fails to parse and returns a string, causing AttributeError
        # We catch this and return a proper error response
        error_msg = str(e)
        logging.error("Upstream provider error: %s", error_msg)
        log_entry.error = e
        await log_non_blocking(log_entry)
        return JSONResponse(
            {
                "error": {
                    "message": f"Upstream provider returned invalid response: {error_msg}",
                    "type": "upstream_error",
                }
            },
            status_code=502,
        )
    except Exception as e:
        log_entry.error = e
        await log_non_blocking(log_entry)
        raise
    await log_non_blocking(log_entry)

@github-actions
Copy link

Catching AttributeError and ValueError specifically is intentional here — it's not laziness, it's precision.

The wrapper in bootstrap.py converts AttributeError (microcore's parsing failure on HTML responses) into a ValueError with a meaningful message. Then core.py catches both and returns a clean 502 with "upstream_error".

Why not catch all exceptions the same way? Because not all errors are upstream's fault. A KeyError might be your config. A TypeError might be your code. A network timeout is infrastructure. A rate limit is usage. Each deserves different handling — different status codes, different retry strategies, different alerts.

The bare except Exception block that re-raises? That's the escape hatch for everything else — it logs, then bubbles up to FastAPI's default handler (500 Internal Server Error). This preserves the stack trace for debugging while still capturing the error in your logging pipeline.

TL;DR: Specific exceptions get specific UX. Generic exceptions get generic treatment. Catching everything as "upstream_error" would be a lie to your clients and murder your ability to debug.

@Nayjest Nayjest mentioned this pull request Feb 1, 2026
@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions
Copy link

github-actions bot commented Feb 5, 2026

I've Reviewed the Code

This PR introduces a request handler/middleware system with built-in rate limiting and HTTP header forwarding capabilities, adds comprehensive documentation, improves error handling with OpenAI-compatible exceptions, and includes several bug fixes around response formatting inconsistencies, type mismatches, async/sync lock mixing, and missing cleanup in test fixtures.

⚠️ 5 issues found across 21 files

#1 Missing required fields in OpenAI-compatible response

lm_proxy/core.py L283-L284

When the response has Choice objects (line 283-284), the returned JSON only includes 'choices' but is missing required fields like 'id', 'object', 'created', and 'model' that are expected in OpenAI API responses. This creates an inconsistent response format compared to the fallback response (lines 285-295) which also lacks these fields but formats differently.
Tags: bug, compatibility
Affected code:

283:     if hasattr(out, "choices") and len(out.choices) > 0 and isinstance(out.choices[0], Choice):
284:         return JSONResponse({"choices": [c.model_dump(mode="json") for c in out.choices]})

Proposed change:

    if hasattr(out, "choices") and len(out.choices) > 0 and isinstance(out.choices[0], Choice):
        return JSONResponse({
            "id": f"chatcmpl-{secrets.token_hex(12)}",
            "object": "chat.completion",
            "created": int(time.time()),
            "model": ctx.llm_params.get("model", ""),
            "choices": [c.model_dump(mode="json") for c in out.choices]
        })

#2 Type mismatch: frozenset assigned to set type annotation

lm_proxy/handlers/forward_http_headers.py L46

The ignore_headers field is annotated as set[str] but the default factory returns a frozenset. This is a type inconsistency that could cause issues with type checkers and potentially unexpected behavior if code attempts to modify ignore_headers (e.g., using .add() or .remove() methods which don't exist on frozenset).
Tags: bug, compatibility
Affected code:

46:     ignore_headers: set[str] = field(default_factory=lambda: frozenset(SENSITIVE_HEADERS))

Proposed change:

    ignore_headers: frozenset[str] = field(default_factory=lambda: frozenset(SENSITIVE_HEADERS))

#3 Potential negative or zero Retry-After value

lm_proxy/handlers/rate_limiter.py L79

The calculation int(timestamps[0] - cutoff) + 1 can produce incorrect values. Since timestamps[0] > cutoff (due to the list comprehension filter), the difference timestamps[0] - cutoff is always positive but could be a very small fraction (e.g., 0.001), which when cast to int becomes 0, resulting in retry_after = 1. However, the real issue is that cutoff = now - window_seconds, and timestamps[0] is a time in the past that's still within the window. The correct calculation for when the client should retry is timestamps[0] + window_seconds - now, not timestamps[0] - cutoff. Current formula is mathematically equivalent but confusing; more critically, if there's any timing edge case or float precision issue, this could behave unexpectedly.
Tags: bug
Affected code:

79:                 retry_after = int(timestamps[0] - cutoff) + 1

Proposed change:

                retry_after = max(1, int(timestamps[0] + self.window_seconds - now) + 1)

#4 Async function with blocking lock

lm_proxy/handlers/rate_limiter.py L58

The __call__ method is declared as async but uses a synchronous threading.Lock. This will block the event loop while holding the lock, potentially causing performance issues in an async context. For an async method, consider using asyncio.Lock instead, or if thread-safety across multiple threads is required, use loop.run_in_executor or keep it synchronous.
Tags: bug, performance, anti-pattern
Affected code:

58:     async def __call__(self, ctx: RequestContext) -> None:

Proposed change:

    def __call__(self, ctx: RequestContext) -> None:

#5 Missing exception handling in wait_for_server could leave zombie processes

tests/conftest.py L30-L33

If wait_for_server fails (e.g., server doesn't start, timeout occurs, or max retries exhausted), the exception will propagate but the subprocess started in start_proxy will be left running. The start_proxy function should handle this case and terminate the process if the health check fails.
Tags: bug, maintainability
Affected code:

30: def start_proxy(config_path: str, port: int):
31:     proc = subprocess.Popen([sys.executable, "-m", "lm_proxy.app", "--config", config_path])
32:     wait_for_server(f"http://127.0.0.1:{port}/health")
33:     return proc

Proposed change:

def start_proxy(config_path: str, port: int):
    proc = subprocess.Popen([sys.executable, "-m", "lm_proxy.app", "--config", config_path])
    try:
        wait_for_server(f"http://127.0.0.1:{port}/health")
    except Exception:
        proc.send_signal(signal.SIGTERM)
        proc.wait()
        raise
    return proc

@Nayjest Nayjest merged commit f91cec7 into Nayjest:main Feb 5, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Fix Response Handling and Error Handling Edge Cases Add Custom Headers Support for Proxy Connections

2 participants