Conversation
0fcd1e7 to
1cbc458
Compare
|
@jasperblues - could you please confirm that PR addresses separate aspects:
|
|
Hi @igordayen — happy to clarify. Are these two separate aspects? No — they're two facets of a single goal. The Spring AI decoupling is the mechanism that makes clean BYOK possible. The problem was that building an LLM service for a user-supplied key required importing Spring AI types directly (e.g.
Yes, exactly.
No. BYOK = Bring Your Own Key — a standard industry term (used by AWS, Azure, Vercel, Fly.io, and many others) for the pattern where a user supplies their own third-party API key rather than relying on platform-managed credentials. We didn't coin it. The |
|
@jasperblues - thanks for clarifications. few followups:
|
|
On Spring AI still being present internally Correct — Spring AI is still used inside the framework, and that's intentional. The abstraction boundary is at the application layer, not the framework internals. Two reasons this matters: User experience. Developers choose Embabel to work with Embabel abstractions. Requiring application code to import Strategic decoupling. Keeping Spring AI out of the public API surface means the framework can replace or supplement it if Spring AI's direction diverges from ours, or if it lacks features we need. This is standard practice for any framework that wraps a third-party library. Coupling the public API to an implementation dependency would make that migration significantly more costly. The On BYOK terminology BYOK has a well-established second life in the AI/SaaS space meaning exactly what we mean here — a user of a multi-tenant platform supplies their own LLM provider API key rather than using a platform-managed shared key. A few references:
The IBM reference you cited is for the original encryption-key management meaning. Both usages coexist. Ours is the correct one for this context. |
yes, for reference purposes, and eventually will evolve into producttion with QoS features also: how BYOK concept gets actually employed in Guide HUB? - thanks |
Yep, you got it now! Containing Spring AI behind the abstraction boundary is what makes a future replacement possible without breaking downstream consumers. If Spring AI types leak into user space, such evolution becomes a breaking change.
Covered in the "How Guide uses this" section of the PR description above. Have a read and let me know if anything's not clear. |
|
summarizing understanding of design points:
Thanks |
igordayen
left a comment
There was a problem hiding this comment.
interesting approach. few comments to consider. thanks
| * @throws InvalidApiKeyException if all candidates fail or no candidates are supplied. | ||
| */ | ||
| fun detectProvider(vararg candidates: ByokFactory): LlmService<*> { | ||
| if (candidates.isEmpty()) { |
There was a problem hiding this comment.
quite recently we audited framework code to use consistently threading mode. recommended to use asyncer cross the board
There was a problem hiding this comment.
Asyncer is a Spring-managed bean wired via AsyncConfiguration. detectProvider is a standalone top-level function with no Spring context — by design, so it can be used outside a Spring application. Injecting Asyncer would break that contract. Virtual threads are the right fit here for short-lived I/O-bound tasks. Notably, AsyncConfiguration itself has a comment suggesting newVirtualThreadPerTaskExecutor() as the preferred direction.
|
|
||
| companion object { | ||
| /** Default model used for key validation probes — cheapest available. */ | ||
| const val VALIDATION_MODEL = AnthropicModels.CLAUDE_HAIKU_4_5 |
There was a problem hiding this comment.
VALIDATION_MODEL is a named constant referencing AnthropicModels.CLAUDE_HAIKU_4_5 — itself a named constant. It is overridable via the validationModel constructor parameter, documented in the KDoc. The same pattern is used in OpenAiCompatibleModelFactory where each companion factory references a named model constant (e.g. OpenAiModels.GPT_41_MINI in openAi()). Not hardcoding.
| /** Default model used for key validation probes — cheapest available. */ | ||
| const val VALIDATION_MODEL = AnthropicModels.CLAUDE_HAIKU_4_5 | ||
|
|
||
| private const val CONNECT_TIMEOUT_MS = 5000 |
There was a problem hiding this comment.
perhaps externalize to be aligned with other retry properties
There was a problem hiding this comment.
OpenAiCompatibleModelFactory uses the same pattern — private const val CONNECT_TIMEOUT_MS = 5000 and READ_TIMEOUT_MS = 600000 at lines 69–70. This is consistent with existing convention.
| private const val READ_TIMEOUT_MS = 600000 | ||
|
|
||
| private val PASS_THROUGH_RETRY_TEMPLATE: RetryTemplate = | ||
| RetryTemplate.builder().maxAttempts(1).build() |
There was a problem hiding this comment.
consider embabel RetryProperties and template
There was a problem hiding this comment.
PASS_THROUGH_RETRY_TEMPLATE is intentionally single-attempt. The probe call must fail fast on an invalid key — retries would introduce unnecessary latency and defeat the purpose of quick validation. This is documented in the buildValidated() KDoc, the PR notes and the updated .adoc files. OpenAiCompatibleModelFactory has the same PASS_THROUGH_RETRY_TEMPLATE at line 71 for the same reason.
| builder.restClientBuilder( | ||
| RestClient.builder() | ||
| .requestFactory(requestFactory.getIfAvailable { | ||
| SimpleClientHttpRequestFactory().apply { |
There was a problem hiding this comment.
technically key can be applied here as well
There was a problem hiding this comment.
Could you clarify what you mean? The requestFactory is for HTTP transport configuration (timeouts, connection pooling). The API key is set via AnthropicApi.builder().apiKey().
There was a problem hiding this comment.
private final SimpleClientHttpRequestFactory defaultFactory = new SimpleClientHttpRequestFactory();
@Override
public ClientHttpRequest createRequest(URI uri, HttpMethod httpMethod) throws IOException {
ClientHttpRequest request = defaultFactory.createRequest(uri, httpMethod);
// Add your custom property (e.g., a header) here
request.getHeaders().add("X-Custom-Property", "MyCustomValue");
return request;
} or use Interceptor on Factory
There was a problem hiding this comment.
Thanks for the example. While we're using Spring AI internally, AnthropicApi.builder().apiKey() is the idiomatic, supported path — it's how Spring AI expects credentials to be wired, and it keeps the transport layer concerned only with transport.
Attaching the key via a custom ClientHttpRequestFactory or interceptor would work at the HTTP level, but adds complexity with no benefit while Spring AI is in play. If we replace Spring AI internally at some point, that would be exactly the right time to revisit how credentials are applied — and this pattern would be a good starting point. Worth a separate issue to track. Out of scope for this PR.
| .observationRegistry(observationRegistry) | ||
| .build() | ||
|
|
||
| return SpringAiLlmService( |
There was a problem hiding this comment.
strategically consider LLMMessageSender directly with Anthropic APIs
There was a problem hiding this comment.
Addressed above and in prior comments. Spring AI is contained behind the abstraction boundary — application code has no Spring AI imports. Replacing the underlying implementation is a future option, not a blocker for this PR.
| * | ||
| * Note: uses the OpenAI wire protocol, not the native Spring AI DeepSeek client. | ||
| */ | ||
| fun deepSeek(apiKey: String): ByokSpec = |
There was a problem hiding this comment.
maintenance involved in maintaining validation models
There was a problem hiding this comment.
Valid long-term observation. The same applies to every model constant across the codebase — AnthropicModels, OpenAiModels, DeepSeekModels etc. all require updates when providers change their model offerings. No different here, unless you would like to point at a specific pattern being used in that regard?
| .observationRegistry(observationRegistry) | ||
| ) | ||
| builder.webClientBuilder( | ||
| WebClient.builder().observationRegistry(observationRegistry) |
There was a problem hiding this comment.
usage of both RestClient and Webclient, is this ok
There was a problem hiding this comment.
Both are required by AnthropicApi.builder() — restClientBuilder for blocking calls, webClientBuilder for streaming. This is a Spring AI API requirement, not a design choice on our part.
igordayen
left a comment
There was a problem hiding this comment.
few non-blocking considerations for future.
|
Good points — both are already addressed by this PR, and the Guide migration shows it concretely. Streaming Connection pooling @Bean
fun pooledRequestFactory(): ClientHttpRequestFactory =
HttpComponentsClientHttpRequestFactory().apply {
setConnectTimeout(5_000)
// Apache HttpComponents manages the pool — size tunable per deployment
}
@Service
class UserModelFactory(
private val requestFactory: ObjectProvider<ClientHttpRequestFactory>,
...
) {
private fun createAnthropicService(model: String, apiKey: String): LlmService<*> =
AnthropicModelFactory(apiKey = apiKey, requestFactory = requestFactory).build(model)
private fun createOpenAiService(model: String, apiKey: String): LlmService<*> =
OpenAiCompatibleModelFactory(baseUrl = null, apiKey = apiKey, ..., requestFactory = requestFactory)
.openAiCompatibleLlm(model, ...)
}One On the validation probe specifically Why the old approach in Guide needed to change
|
1cbc458 to
f8201bb
Compare
|



BYOK Phase 2
Summary
Adds framework primitives to support BYOK (Bring Your Own Key) cleanly from any application.
The motivation is Guide's current implementation, which leaks Spring AI types across the framework boundary and requires hand-rolling provider-specific builder code.
Guide is Embabel's dog-fooding app — if the framework doesn't make this easy, every third-party application developer faces the same problem. The same patterns are required by anyone building a multi-user application on Embabel: internal tooling, external SaaS products, and everything in between. Getting this right in the framework removes the hand-rolled boilerplate from all of them.
Changes
1. Extract
AnthropicModelFactoryBefore:
AnthropicModelsConfigis a monolithic Spring@Configuration. The Anthropic API construction (createAnthropicApi()) and service construction (createAnthropicLlm()) are private methods buried inside the config class — inaccessible for BYOK use without going through Spring.After: New open class
AnthropicModelFactorymirrors the shape ofOpenAiCompatibleModelFactoryand implementsByokFactory:AnthropicModelsConfigbecomes a thin subclass. No behaviour change for the normal (non-BYOK) path.The
validationModelconstructor param exists for the rare case where a key only grants access to a specific set of models. The default (CLAUDE_HAIKU_4_5) is the cheapest available.2.
buildValidated()onOpenAiCompatibleModelFactoryBefore:
OpenAiCompatibleModelFactoryhasopenAiCompatibleLlm()but no validation method.After:
buildValidated()lives on the factory:Internal behaviour:
PASS_THROUGH_RETRY_TEMPLATE(fail fast, no retries)probe.createMessageSender(LlmOptions()).call(listOf(UserMessage("Hi")), emptyList())InvalidApiKeyException(message)— no Spring AI types surface3.
ByokFactoryinterface +ByokSpec+detectProvider()The core of the phase 2 refinement: a self-contained spec pattern that eliminates call-site boilerplate.
ByokFactory— fun interface incom.embabel.agent.spi:AnthropicModelFactoryimplements it directly. For OpenAI-compatible providers,ByokSpec(nested class onOpenAiCompatibleModelFactory) implements it.Named companion factories on
OpenAiCompatibleModelFactoryreturnByokSpec:Each
ByokSpechas a fluent.validating(model, provider)override for keys with restricted model access.detectProvider()— top-level function incom.embabel.agent.spi:Concurrently races all candidates via
Executors.newVirtualThreadPerTaskExecutor()+invokeAny. First success wins; remaining tasks are cancelled. All fail →InvalidApiKeyException. The resultingLlmService.provideridentifies the winner.Full sign-up flow call site:
Settings flow (single known provider):
Custom provider extension function pattern:
Concurrency note — why
newVirtualThreadPerTaskExecutor()with no explicit limit?The candidate count is structurally bounded by the number of BYOK-supported providers in the application — realistically 2–12, never user-controlled. At 50,000 MAU with aggressive onboarding assumptions (5% new users in a peak week, concentrated into business hours):
Even at 10× spike with 12 providers: ~120 virtual threads active simultaneously, each blocked on outbound HTTP. Virtual threads are designed for millions of concurrent I/O-bound tasks; 120 is negligible. The real constraint at any realistic scale is external provider rate limits — which are per user key and therefore don't compound. Adding a configurable thread cap would cost real API surface area to defend against a phantom concern.
Design note — why not a function-reference overload?
We considered
detectProvider(apiKey, ::AnthropicModelFactory, OpenAiCompatibleModelFactory::openAi, ...)which would shave one argument per line in the zero-config case. Rejected: it breaks whenever any single candidate needs customisation (
.validating()orvalidationModel), forcing the caller to mix styles. The savings are modest; the sharp edge is real 🗡️Design note — why not a typed
LlmProviderenum or inline value class?A new enum would duplicate the existing
ModelMetadata.provider: Stringfield andXxxModels.PROVIDERconstants. An@JvmInline value classwould breakconst valon thePROVIDERconstants, requiring Java breaking changes. SinceByokFactoryinstances carry their own provider identity (via the returnedLlmService.provider), no separate provider key is needed at all — the map-key approach was the problem.What does NOT change
AnthropicModelsConfig— behaviour unchangedOpenAiCompatibleModelFactory.openAiCompatibleLlm()— unchanged, existing callers unaffectedHow Guide uses this
The problem today
Guide's
UserModelFactorybuildsSpringAiLlmServiceinstances by hand, importing Spring AI types directly (AnthropicChatModel.builder(),OpenAiChatModel.builder(), etc.). Key validation casts across the framework boundary:The 4-provider validation fan-out is sequential, and Spring AI logs noisy stack traces from the expected 401 probe calls.
After this PR
Guide's
UserModelFactoryreplaces four hand-rolled builder methods with factory delegations. No Spring AI imports remain. For service creation (no validation):Key validation (when user stores a key) becomes:
Spring AI stack traces from probe 401s are eliminated. Fan-out detection (sign-up flow) becomes a single
detectProvider(...)call as shown above.Security
buildValidated()anddetectProvider()validate keys and return anLlmService— key lifecycle management is the caller's responsibility. Embabel never stores or logs keys.Guide's approach as a reference:
UserKeyStore) — never persisted to disk or databaselocal-storage caching; a stolen blob is useless without the server secret
Callers should also ensure: keys transmitted over HTTPS only, log levels prevent key material appearing in plaintext, cached
LlmServiceinstances revoked on logout or key rotation.Reviewer notes
detectProvideris parameterised by the caller — it encodes no assumptions about which providers an app supports. Suitable for any BYOK application, not just Guide.invokeAnyreturns whichever task completes first if it ever occurs.Testing notes
Tests were written test-first (non-compiling until implementation lands) following existing project conventions:
DetectProviderTest— pure unit test using mockkByokFactorylambdas; no HTTP, no Springcontext. Follows the mockk patterns in
OpenAiCompatibleModelFactoryTest.OpenAiCompatibleModelFactoryByokSpecTest— construction-only unit tests for all namedfactories and
byok(); verifiesByokFactorycontract and.validating()chaining.OpenAiCompatibleModelFactoryBuildValidatedTest— uses Sun's built-inHttpServerto stub401/200 responses locally, following the precedent in
NettyClientAutoConfigurationTest.No WireMock dependency needed.
AnthropicModelFactoryTest— construction-only tests forbuild()and custom baseUrl,same pattern as the existing
OpenAiCompatibleModelFactoryTest.AnthropicModelFactoryBuildValidatedTest— uses Sun's built-inHttpServerto stub401/200 responses, same pattern as
OpenAiCompatibleModelFactoryBuildValidatedTest;added during review to cover
buildValidated()andbuildValidated(model).