Conversation
Introduce a generic TokenCounter<T> fun interface for estimating token counts across different content types. Phase 1 provides the SPI shape and a character-heuristic default for design feedback. - TokenCounter<T> fun interface in com.embabel.common.ai.model - CharacterHeuristicTokenCounter with configurable charsPerToken ratio - PromptContribution.estimatedTokens field - PromptContributor.estimateTokens() and promptContribution(counter) - Kotlin, Java interop, and generic type parameter tests - All new types marked @ApiStatus.Experimental Signed-off-by: James Dunnam <7660553+jimador@users.noreply.github.com>
Rename the SPI method to better reflect that token counting is an estimation. Use ceiling division in CharacterHeuristicTokenCounter so short inputs are never underestimated. Route promptContribution(counter) through estimateTokens so overrides are respected consistently. Signed-off-by: James Dunnam <7660553+jimador@users.noreply.github.com>
|
A couple of thoughts:
|
|
@simeshev good catch on the double I originally thought A capability-style approach on contributors might still make sense for cases that can report their own size, but the real value from token counting seems to show up at the assembly layer — what I was thinking of as “Phase 2”:
I intentionally didn’t start there — I wanted to first get a gauge on whether a token counting API itself is something worth bringing into the framework before going deep on those pieces. |
and here is the budget starting point: |
…ributor Per PR feedback, token estimation on individual contributors creates an awkward API where callers pass infrastructure (TokenCounter) to content producers. Token estimation belongs at the assembly layer. Removes: - estimateTokens(counter: TokenCounter<String>): Int - promptContribution(counter: TokenCounter<String>): PromptContribution - Associated TokenEstimation test class Keeps TokenCounter SPI, CharacterHeuristicTokenCounter, and PromptContribution.estimatedTokens field for Phase 2 integration. Signed-off-by: James Dunnam <7660553+jimador@users.noreply.github.com>
|
Does this change stand on its own? If we make it a part of the public API - how would Embabel users use it? |
|
@jimador PR is in Draft state, I assume it's not ready for review / merge yet, is this correct? |
Summary
This introduces a
TokenCounter<T>SPI for making token counts visible when needed (e.g., prompt assembly, cost estimation, context window management, and budget enforcement). ( Token estimation SPI for prompt assembly #1497 )Today,
PromptContributorcan inject content, but there's no way to know what that content costs in tokens before this change. That becomes a problem as soon as you have:WindowingConversationFormattermaking truncation decisions by message count instead of token costPricingModel.costOf(inputTokens, outputTokens)but no way to getinputTokensfrom contributions)This PR adds the measurement primitives and plumbing to surface token estimates on prompt contributions. It does not attempt budgeting, truncation, prioritization, or windowing yet.
All new types are marked
@ApiStatus.Experimental.What's in this PR
TokenCounter<T>(fun interface)TokenCounter.heuristic()-> default implementationCharacterHeuristicTokenCounter(configurablecharsPerToken, default = 4, ceiling division)PromptContribution.estimatedTokens(optional metadata)PromptContributor.estimateTokens(counter)PromptContributor.promptContribution(counter)Why
TokenCounter<T>is genericToken counting is not a "string problem." It's a modality problem.
Different inputs -> different counting rules -> different providers -> different math.
Examples
TokenCounter<String>TokenCounter<Message>(future)Messageexists (avoids module coupling)TokenCounter<AgentImage>85 + 170 * ceil(w/512) * ceil(h/512)for high-detail(w * h) / 750Audio
Composition model
Keep it simple. Build up from smaller counters:
Prior Art (for context)
A couple of existing approaches land in roughly the same place, and they help frame the boundary here:
LangChain4j
Spring AI
MediaContent)JTokkitTokenCountEstimatortokenizes strings and does not meaningfully account for non-text modalities)The common pattern: token counting doesn’t generalize cleanly across modalities.
What's next (options for final PR or follow-ups)
Integration with existing budget infrastructure
The framework already has budget concepts that token estimation could feed into:
BudgetinProcessOptions- tracks cost, actions, and total tokens at the process level. Currently populated after LLM calls. Token estimation could inform this before a call is made.EarlyTerminationPolicy.maxTokens- terminates processes when token usage exceeds a limit. Pre-flight estimation could make this more proactive.Thinking.tokenBudget- allocates tokens for model thinking (Anthropic extended thinking). Different concern, but shows the pattern of token-aware resource allocation already exists in the framework.Other possibilities
TokenCounter<Message>inembabel-agent-apiPricingModelintegration (pre-flight cost checks)ContextualPromptElementadoptionDesign Notes
This SPI doesn't try to solve correctness globally. It gives us a place to plug in better answers over time.
Open Questions
TokenCounter<T>the right abstraction, or should we constrain to text for now?embabel-agent-aithe right module boundary?PromptContributor, or introduce a separate budgeting layer?ContextualPromptElementadopt this?Test Plan
TokenCounterTest- contract, factory, SAM conversion, generic type parameterCharacterHeuristicTokenCounterTest- heuristic behavior, configurable ratio, ceiling divisionTokenCounterJavaUsageTest- Java interop (factory, lambda, null rejection)PromptContributorTest.TokenEstimation- estimateTokens, override consistency, promptContribution with counter