Skip to content

Bring your own key : clean API#1541

Merged
jasperblues merged 1 commit intomainfrom
feature/byok-phase-2
Mar 30, 2026
Merged

Bring your own key : clean API#1541
jasperblues merged 1 commit intomainfrom
feature/byok-phase-2

Conversation

@jasperblues
Copy link
Copy Markdown
Contributor

@jasperblues jasperblues commented Mar 28, 2026

BYOK Phase 2

Summary

Adds framework primitives to support BYOK (Bring Your Own Key) cleanly from any application.
The motivation is Guide's current implementation, which leaks Spring AI types across the framework boundary and requires hand-rolling provider-specific builder code.

Guide is Embabel's dog-fooding app — if the framework doesn't make this easy, every third-party application developer faces the same problem. The same patterns are required by anyone building a multi-user application on Embabel: internal tooling, external SaaS products, and everything in between. Getting this right in the framework removes the hand-rolled boilerplate from all of them.


Changes

1. Extract AnthropicModelFactory

Before: AnthropicModelsConfig is a monolithic Spring @Configuration. The Anthropic API construction (createAnthropicApi()) and service construction (createAnthropicLlm()) are private methods buried inside the config class — inaccessible for BYOK use without going through Spring.

After: New open class AnthropicModelFactory mirrors the shape of OpenAiCompatibleModelFactory and implements ByokFactory:

open class AnthropicModelFactory(
    private val apiKey: String,
    private val baseUrl: String? = null,
    private val validationModel: String = VALIDATION_MODEL,  // overridable per key tier
    protected val observationRegistry: ObservationRegistry = ObservationRegistry.NOOP,
    private val requestFactory: ObjectProvider<ClientHttpRequestFactory> = ObjectProviders.empty(),
) : ByokFactory {
    fun build(model: String, retryTemplate: RetryTemplate = RetryUtils.DEFAULT_RETRY_TEMPLATE): LlmService<*>
    override fun buildValidated(): LlmService<*>          // ByokFactory — uses validationModel
    fun buildValidated(model: String): LlmService<*>      // direct call — explicit model
}

AnthropicModelsConfig becomes a thin subclass. No behaviour change for the normal (non-BYOK) path.

The validationModel constructor param exists for the rare case where a key only grants access to a specific set of models. The default (CLAUDE_HAIKU_4_5) is the cheapest available.

2. buildValidated() on OpenAiCompatibleModelFactory

Before: OpenAiCompatibleModelFactory has openAiCompatibleLlm() but no validation method.

After: buildValidated() lives on the factory:

fun buildValidated(
    model: String,
    pricingModel: PricingModel,
    provider: String,
    knowledgeCutoffDate: LocalDate?,
): LlmService<*>

Internal behaviour:

  1. Build a probe service using PASS_THROUGH_RETRY_TEMPLATE (fail fast, no retries)
  2. Call probe.createMessageSender(LlmOptions()).call(listOf(UserMessage("Hi")), emptyList())
  3. On exception: throw clean InvalidApiKeyException(message) — no Spring AI types surface
  4. On success: build and return production service with standard retry template

3. ByokFactory interface + ByokSpec + detectProvider()

The core of the phase 2 refinement: a self-contained spec pattern that eliminates call-site boilerplate.

ByokFactory — fun interface in com.embabel.agent.spi:

fun interface ByokFactory {
    fun buildValidated(): LlmService<*>
}

AnthropicModelFactory implements it directly. For OpenAI-compatible providers, ByokSpec (nested class on OpenAiCompatibleModelFactory) implements it.

Named companion factories on OpenAiCompatibleModelFactory return ByokSpec:

OpenAiCompatibleModelFactory.openAi(apiKey)     // → GPT-4.1-mini probe
OpenAiCompatibleModelFactory.deepSeek(apiKey)   // → deepseek-chat probe
OpenAiCompatibleModelFactory.mistral(apiKey)    // → ministral-8b probe
OpenAiCompatibleModelFactory.gemini(apiKey)     // → gemini-2.5-flash probe
OpenAiCompatibleModelFactory.byok(             // custom provider
    baseUrl, apiKey, validationModel, validationProvider
)

Each ByokSpec has a fluent .validating(model, provider) override for keys with restricted model access.

detectProvider() — top-level function in com.embabel.agent.spi:

fun detectProvider(vararg candidates: ByokFactory): LlmService<*>

Concurrently races all candidates via Executors.newVirtualThreadPerTaskExecutor() + invokeAny. First success wins; remaining tasks are cancelled. All fail → InvalidApiKeyException. The resulting LlmService.provider identifies the winner.

Full sign-up flow call site:

val service = detectProvider(
    AnthropicModelFactory(apiKey = userKey),
    OpenAiCompatibleModelFactory.openAi(userKey),
    OpenAiCompatibleModelFactory.deepSeek(userKey),
    OpenAiCompatibleModelFactory.mistral(userKey),
    OpenAiCompatibleModelFactory.gemini(userKey),
)
val detectedProvider = service.provider

Settings flow (single known provider):

val service = detectProvider(AnthropicModelFactory(apiKey = userKey))

Custom provider extension function pattern:

fun OpenAiCompatibleModelFactory.Companion.acme(apiKey: String) =
    OpenAiCompatibleModelFactory.byok("https://api.acme.com/v1", apiKey, "acme-small", "Acme")

Concurrency note — why newVirtualThreadPerTaskExecutor() with no explicit limit?

The candidate count is structurally bounded by the number of BYOK-supported providers in the application — realistically 2–12, never user-controlled. At 50,000 MAU with aggressive onboarding assumptions (5% new users in a peak week, concentrated into business hours):

~2,500 new users ÷ 2,400 business-hour minutes ≈ 1 onboarding req/min peak

Even at 10× spike with 12 providers: ~120 virtual threads active simultaneously, each blocked on outbound HTTP. Virtual threads are designed for millions of concurrent I/O-bound tasks; 120 is negligible. The real constraint at any realistic scale is external provider rate limits — which are per user key and therefore don't compound. Adding a configurable thread cap would cost real API surface area to defend against a phantom concern.

Design note — why not a function-reference overload?

We considered detectProvider(apiKey, ::AnthropicModelFactory, OpenAiCompatibleModelFactory::openAi, ...)
which would shave one argument per line in the zero-config case. Rejected: it breaks whenever any single candidate needs customisation (.validating() or validationModel), forcing the caller to mix styles. The savings are modest; the sharp edge is real 🗡️

Design note — why not a typed LlmProvider enum or inline value class?

A new enum would duplicate the existing ModelMetadata.provider: String field and XxxModels.PROVIDER constants. An @JvmInline value class would break const val on the PROVIDER constants, requiring Java breaking changes. Since ByokFactory instances carry their own provider identity (via the returned LlmService.provider), no separate provider key is needed at all — the map-key approach was the problem.


What does NOT change

  • Normal (non-BYOK) model wiring in AnthropicModelsConfig — behaviour unchanged
  • OpenAiCompatibleModelFactory.openAiCompatibleLlm() — unchanged, existing callers unaffected

How Guide uses this

The problem today

Guide's UserModelFactory builds SpringAiLlmService instances by hand, importing Spring AI types directly (AnthropicChatModel.builder(), OpenAiChatModel.builder(), etc.). Key validation casts across the framework boundary:

// Guide today — Spring AI type leaks out of embabel-agent
(service as SpringAiLlmService).chatModel.call("Hi")

The 4-provider validation fan-out is sequential, and Spring AI logs noisy stack traces from the expected 401 probe calls.

After this PR

Guide's UserModelFactory replaces four hand-rolled builder methods with factory delegations. No Spring AI imports remain. For service creation (no validation):

OPENAIOpenAiCompatibleModelFactory(null, apiKey, null, null)
                .openAiCompatibleLlm(model, PricingModel.ALL_YOU_CAN_EAT, OpenAiModels.PROVIDER, null)
ANTHROPICAnthropicModelFactory(apiKey = apiKey).build(model)
MISTRALOpenAiCompatibleModelFactory("https://api.mistral.ai/v1", apiKey, null, null)
                .openAiCompatibleLlm(model, PricingModel.ALL_YOU_CAN_EAT, MistralAiModels.PROVIDER, null)
DEEPSEEKOpenAiCompatibleModelFactory("https://api.deepseek.com", apiKey, null, null)
                .openAiCompatibleLlm(model, PricingModel.ALL_YOU_CAN_EAT, DeepSeekModels.PROVIDER, null)

Key validation (when user stores a key) becomes:

fun validateKey(provider: LlmProvider, apiKey: String): String? = try {
    when (provider) {
        OPENAI    -> OpenAiCompatibleModelFactory.openAi(apiKey)
        ANTHROPIC -> AnthropicModelFactory(apiKey = apiKey)
        MISTRAL   -> OpenAiCompatibleModelFactory.mistral(apiKey)
        DEEPSEEK  -> OpenAiCompatibleModelFactory.deepSeek(apiKey)
    }.buildValidated()
    null
} catch (e: InvalidApiKeyException) {
    "Invalid API key"
} catch (e: Exception) {
    "Could not validate key: ${e.message}"
}

Spring AI stack traces from probe 401s are eliminated. Fan-out detection (sign-up flow) becomes a single detectProvider(...) call as shown above.

Security

buildValidated() and detectProvider() validate keys and return an LlmService — key lifecycle management is the caller's responsibility. Embabel never stores or logs keys.

Guide's approach as a reference:

  • Keys held in server-side memory only (UserKeyStore) — never persisted to disk or database
  • Client receives an AES-256-GCM encrypted blob (server holds the encryption key) for
    local-storage caching; a stolen blob is useless without the server secret
  • On page reload the client sends the blob back; server decrypts and restores the in-memory key

Callers should also ensure: keys transmitted over HTTPS only, log levels prevent key material appearing in plaintext, cached LlmService instances revoked on logout or key rotation.


Reviewer notes

  • detectProvider is parameterised by the caller — it encodes no assumptions about which providers an app supports. Suitable for any BYOK application, not just Guide.
  • A single-argument call is a valid degenerate case: validated build with clean error handling, no concurrency. This is the settings flow path.
  • Tie-breaking if a key validates with two providers is not a practical concern (key formats are distinct); invokeAny returns whichever task completes first if it ever occurs.

Testing notes

Tests were written test-first (non-compiling until implementation lands) following existing project conventions:

  • DetectProviderTest — pure unit test using mockk ByokFactory lambdas; no HTTP, no Spring
    context. Follows the mockk patterns in OpenAiCompatibleModelFactoryTest.
  • OpenAiCompatibleModelFactoryByokSpecTest — construction-only unit tests for all named
    factories and byok(); verifies ByokFactory contract and .validating() chaining.
  • OpenAiCompatibleModelFactoryBuildValidatedTest — uses Sun's built-in HttpServer to stub
    401/200 responses locally, following the precedent in NettyClientAutoConfigurationTest.
    No WireMock dependency needed.
  • AnthropicModelFactoryTest — construction-only tests for build() and custom baseUrl,
    same pattern as the existing OpenAiCompatibleModelFactoryTest.
  • AnthropicModelFactoryBuildValidatedTest — uses Sun's built-in HttpServer to stub
    401/200 responses, same pattern as OpenAiCompatibleModelFactoryBuildValidatedTest;
    added during review to cover buildValidated() and buildValidated(model).

@jasperblues jasperblues force-pushed the feature/byok-phase-2 branch 2 times, most recently from 0fcd1e7 to 1cbc458 Compare March 28, 2026 07:27
@igordayen
Copy link
Copy Markdown
Contributor

@jasperblues - could you please confirm that PR addresses separate aspects:

  • relaxing dependencies on SpringAI in code using Embabel ( there is example how to create OpenAI LLM Messege Sender: meta-agent/meta-agent-service/src/test/java/com/embabel/example/tool/loop/OpenAiLlmMessageSender.java, in agent-gen repo)
  • Bring Your Own Factory, from writeup:
    baseUrl, apiKey, validationModel, validationProvider```
is name "validation" referring to "llm model"?  abd byok==customModel?

@jasperblues
Copy link
Copy Markdown
Contributor Author

Hi @igordayen — happy to clarify.

Are these two separate aspects?

No — they're two facets of a single goal. The Spring AI decoupling is the mechanism that makes clean BYOK possible. The problem was that building an LLM service for a user-supplied key required importing Spring AI types directly (e.g. org.springframework.ai.anthropic.AnthropicChatModel.builder()). This PR extracts com.embabel.agent.config.models.anthropic.AnthropicModelFactory and adds factory methods to com.embabel.agent.openai.OpenAiCompatibleModelFactory so that application code never needs to touch Spring AI internals. The BYOK capability is the result of that extraction. They're bundled intentionally — one is the implementation of the other.

validationModel — is this an LLM model name?

Yes, exactly. validationModel is the name of the LLM model used for the probe call inside buildValidated(). When you call buildValidated(), the framework sends a minimal "Hi" message to verify the key is valid before returning a production service. validationModel is which model that probe uses — e.g. "gpt-4o-mini" or "claude-haiku-4-5". It defaults to the cheapest available model for each provider. You'd only override it if your API key grants access to a restricted subset of models.

byok — does this mean "custom model"?

No. BYOK = Bring Your Own Key — a standard industry term (used by AWS, Azure, Vercel, Fly.io, and many others) for the pattern where a user supplies their own third-party API key rather than relying on platform-managed credentials. We didn't coin it. The byok() companion factory is for OpenAI-compatible providers not covered by the named shortcuts (openAi(), deepSeek(), mistral(), gemini()). For those you supply baseUrl and apiKey yourself — hence "bring your own key".

@igordayen
Copy link
Copy Markdown
Contributor

@jasperblues - thanks for clarifications. few followups:

  • dependency on spring AI with OpenAiCompatible factory is on import-level, while spring ai starters / autoconfigs are still in place. So basically, PR provides with additional layer, which would not require spring AI imports, while respectice libraries still gets used. On contrary, OpenAiLLMMessageSender entirely eliminates dependency on spring AI
  • Re: "where a user supplies their own third-party API key rather than relying on platform-managed credentials." - is not behind the hood HUB still relies on provider-managed credentials? One of the references, https://www.ibm.com/think/topics/byok, states: "Bring your own key (BYOK) is an [encryption]key management approach where customers of a cloud service provider (CSP) generate and manage their own encryption keys. In cloud environments, BYOK offers organizations greater control over data security, visibility and compliance requirements.". Which aspect of BYOK needs to be addressed by Guide HUB?

@jasperblues
Copy link
Copy Markdown
Contributor Author

On Spring AI still being present internally

Correct — Spring AI is still used inside the framework, and that's intentional. The abstraction boundary is at the application layer, not the framework internals.

Two reasons this matters:

User experience. Developers choose Embabel to work with Embabel abstractions. Requiring application code to import org.springframework.ai.anthropic.AnthropicChatModel to wire up a user's LLM key breaks that contract. The before/after for Guide is already in the PR description — the "before" has Spring AI types leaking directly into application code; the "after" does not.

Strategic decoupling. Keeping Spring AI out of the public API surface means the framework can replace or supplement it if Spring AI's direction diverges from ours, or if it lacks features we need. This is standard practice for any framework that wraps a third-party library. Coupling the public API to an implementation dependency would make that migration significantly more costly.

The OpenAiLlmMessageSender you referenced is a raw HTTP test utility — it's not a production LlmService with retry, observability, or provider metadata. A different thing entirely.


On BYOK terminology

BYOK has a well-established second life in the AI/SaaS space meaning exactly what we mean here — a user of a multi-tenant platform supplies their own LLM provider API key rather than using a platform-managed shared key. A few references:

The IBM reference you cited is for the original encryption-key management meaning. Both usages coexist. Ours is the correct one for this context.

@igordayen
Copy link
Copy Markdown
Contributor

igordayen commented Mar 29, 2026

The OpenAiLlmMessageSender you referenced is a raw HTTP test utility — it's not a production LlmService with retry, observability, or provider metadata. A different thing entirely.

yes, for reference purposes, and eventually will evolve into producttion with QoS features

also: how BYOK concept gets actually employed in Guide HUB? - thanks

@jasperblues
Copy link
Copy Markdown
Contributor Author

jasperblues commented Mar 29, 2026

yes, for reference purposes, and eventually will evolve into producttion with QoS features

Yep, you got it now! Containing Spring AI behind the abstraction boundary is what makes a future replacement possible without breaking downstream consumers. If Spring AI types leak into user space, such evolution becomes a breaking change.

also: how BYOK concept gets actually employed in Guide HUB? - thanks

Covered in the "How Guide uses this" section of the PR description above. Have a read and let me know if anything's not clear.

@igordayen
Copy link
Copy Markdown
Contributor

igordayen commented Mar 29, 2026

summarizing understanding of design points:

  • HUB can be deployed on a developer desktop / docker - no BYOK is necessary
  • HUB can be deployed in cloud env as OpenAI Compatable - there will be potentially a model per user with separate connectivity params, such as token
  • what is envision of strategic positioning HUB alomgside with Assistant as desktop app

Thanks

Copy link
Copy Markdown
Contributor

@igordayen igordayen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interesting approach. few comments to consider. thanks

* @throws InvalidApiKeyException if all candidates fail or no candidates are supplied.
*/
fun detectProvider(vararg candidates: ByokFactory): LlmService<*> {
if (candidates.isEmpty()) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

quite recently we audited framework code to use consistently threading mode. recommended to use asyncer cross the board

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Asyncer is a Spring-managed bean wired via AsyncConfiguration. detectProvider is a standalone top-level function with no Spring context — by design, so it can be used outside a Spring application. Injecting Asyncer would break that contract. Virtual threads are the right fit here for short-lived I/O-bound tasks. Notably, AsyncConfiguration itself has a comment suggesting newVirtualThreadPerTaskExecutor() as the preferred direction.


companion object {
/** Default model used for key validation probes — cheapest available. */
const val VALIDATION_MODEL = AnthropicModels.CLAUDE_HAIKU_4_5
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hardcoding

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

VALIDATION_MODEL is a named constant referencing AnthropicModels.CLAUDE_HAIKU_4_5 — itself a named constant. It is overridable via the validationModel constructor parameter, documented in the KDoc. The same pattern is used in OpenAiCompatibleModelFactory where each companion factory references a named model constant (e.g. OpenAiModels.GPT_41_MINI in openAi()). Not hardcoding.

/** Default model used for key validation probes — cheapest available. */
const val VALIDATION_MODEL = AnthropicModels.CLAUDE_HAIKU_4_5

private const val CONNECT_TIMEOUT_MS = 5000
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps externalize to be aligned with other retry properties

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OpenAiCompatibleModelFactory uses the same pattern — private const val CONNECT_TIMEOUT_MS = 5000 and READ_TIMEOUT_MS = 600000 at lines 69–70. This is consistent with existing convention.

private const val READ_TIMEOUT_MS = 600000

private val PASS_THROUGH_RETRY_TEMPLATE: RetryTemplate =
RetryTemplate.builder().maxAttempts(1).build()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider embabel RetryProperties and template

Copy link
Copy Markdown
Contributor Author

@jasperblues jasperblues Mar 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PASS_THROUGH_RETRY_TEMPLATE is intentionally single-attempt. The probe call must fail fast on an invalid key — retries would introduce unnecessary latency and defeat the purpose of quick validation. This is documented in the buildValidated() KDoc, the PR notes and the updated .adoc files. OpenAiCompatibleModelFactory has the same PASS_THROUGH_RETRY_TEMPLATE at line 71 for the same reason.

builder.restClientBuilder(
RestClient.builder()
.requestFactory(requestFactory.getIfAvailable {
SimpleClientHttpRequestFactory().apply {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

technically key can be applied here as well

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you clarify what you mean? The requestFactory is for HTTP transport configuration (timeouts, connection pooling). The API key is set via AnthropicApi.builder().apiKey().

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

private final SimpleClientHttpRequestFactory defaultFactory = new SimpleClientHttpRequestFactory();

@Override
public ClientHttpRequest createRequest(URI uri, HttpMethod httpMethod) throws IOException {
    ClientHttpRequest request = defaultFactory.createRequest(uri, httpMethod);
    // Add your custom property (e.g., a header) here
    request.getHeaders().add("X-Custom-Property", "MyCustomValue");
    return request;
}  or use Interceptor on Factory

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the example. While we're using Spring AI internally, AnthropicApi.builder().apiKey() is the idiomatic, supported path — it's how Spring AI expects credentials to be wired, and it keeps the transport layer concerned only with transport.

Attaching the key via a custom ClientHttpRequestFactory or interceptor would work at the HTTP level, but adds complexity with no benefit while Spring AI is in play. If we replace Spring AI internally at some point, that would be exactly the right time to revisit how credentials are applied — and this pattern would be a good starting point. Worth a separate issue to track. Out of scope for this PR.

.observationRegistry(observationRegistry)
.build()

return SpringAiLlmService(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

strategically consider LLMMessageSender directly with Anthropic APIs

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed above and in prior comments. Spring AI is contained behind the abstraction boundary — application code has no Spring AI imports. Replacing the underlying implementation is a future option, not a blocker for this PR.

*
* Note: uses the OpenAI wire protocol, not the native Spring AI DeepSeek client.
*/
fun deepSeek(apiKey: String): ByokSpec =
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maintenance involved in maintaining validation models

Copy link
Copy Markdown
Contributor Author

@jasperblues jasperblues Mar 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Valid long-term observation. The same applies to every model constant across the codebase — AnthropicModels, OpenAiModels, DeepSeekModels etc. all require updates when providers change their model offerings. No different here, unless you would like to point at a specific pattern being used in that regard?

Copy link
Copy Markdown
Contributor

@igordayen igordayen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one more comment on using Rest / Web client.

For multi-tenant application with heavy traffic good also to consider:

  • usage of HTTP connection pool
  • usage of streaming API - better for scalability
    Thanks

.observationRegistry(observationRegistry)
)
builder.webClientBuilder(
WebClient.builder().observationRegistry(observationRegistry)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

usage of both RestClient and Webclient, is this ok

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both are required by AnthropicApi.builder()restClientBuilder for blocking calls, webClientBuilder for streaming. This is a Spring AI API requirement, not a design choice on our part.

Copy link
Copy Markdown
Contributor

@igordayen igordayen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

few non-blocking considerations for future.

@jasperblues
Copy link
Copy Markdown
Contributor Author

Good points — both are already addressed by this PR, and the Guide migration shows it concretely.

Streaming
WebClient is present specifically for streaming (as noted on the AnthropicModelFactory.kt:108 thread). That's the reason both clients exist.

Connection pooling
The requestFactory: ObjectProvider<ClientHttpRequestFactory> constructor parameter exists for exactly this. Once this PR lands, Guide's UserModelFactory migrates from building Spring AI types directly to:

@Bean
fun pooledRequestFactory(): ClientHttpRequestFactory =
    HttpComponentsClientHttpRequestFactory().apply {
        setConnectTimeout(5_000)
        // Apache HttpComponents manages the pool — size tunable per deployment
    }

@Service
class UserModelFactory(
    private val requestFactory: ObjectProvider<ClientHttpRequestFactory>,
    ...
) {
    private fun createAnthropicService(model: String, apiKey: String): LlmService<*> =
        AnthropicModelFactory(apiKey = apiKey, requestFactory = requestFactory).build(model)

    private fun createOpenAiService(model: String, apiKey: String): LlmService<*> =
        OpenAiCompatibleModelFactory(baseUrl = null, apiKey = apiKey, ..., requestFactory = requestFactory)
            .openAiCompatibleLlm(model, ...)
}

One @Bean, picked up by every provider. No changes to AnthropicModelFactory needed — the hook is already there.

On the validation probe specifically
buildValidated() is a key-registration probe, not a hot path. Even at an ambitious 10M MAU with 1% registering per day: ~1,150 validations/day = 0.013/sec. Connection pooling is irrelevant at that rate. The production LlmService returned after validation is where traffic flows — and it inherits the same requestFactory.

Why the old approach in Guide needed to change

UserModelFactory currently imports 10 Spring AI types directly (lines 6–18: AnthropicChatModel, AnthropicApi, OpenAiChatModel, OpenAiApi, DeepSeekChatModel, DeepSeekApi, MistralAiChatModel, MistralAiApi, MistralAiChatOptions, ToolCallingManager). This couples application code to Spring AI internals. PR #1541 is the prerequisite that makes the above migration possible.

@jasperblues jasperblues force-pushed the feature/byok-phase-2 branch from 1cbc458 to f8201bb Compare March 29, 2026 22:57
@sonarqubecloud
Copy link
Copy Markdown

@jasperblues jasperblues merged commit 845255d into main Mar 30, 2026
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants