feature: TokenCounter SPI by jimador · Pull Request #1536 · embabel/embabel-agent

jimador · 2026-03-24T20:55:49Z

Summary

This introduces a TokenCounter<T> SPI for making token counts visible when needed (e.g., prompt assembly, cost estimation, context window management, and budget enforcement). ( Token estimation SPI for prompt assembly #1497 )

Today, PromptContributor can inject content, but there's no way to know what that content costs in tokens before this change. That becomes a problem as soon as you have:

Multiple contributors competing for a finite context window (RAG results, conversation history, system prompts)
Variable-length content where size isn't predictable at build time
WindowingConversationFormatter making truncation decisions by message count instead of token cost
A need to estimate cost before making an LLM call (we have PricingModel.costOf(inputTokens, outputTokens) but no way to get inputTokens from contributions)

This PR adds the measurement primitives and plumbing to surface token estimates on prompt contributions. It does not attempt budgeting, truncation, prioritization, or windowing yet.

All new types are marked @ApiStatus.Experimental.

What's in this PR

TokenCounter<T> (fun interface)
TokenCounter.heuristic() -> default implementation
CharacterHeuristicTokenCounter (configurable charsPerToken, default = 4, ceiling division)
PromptContribution.estimatedTokens (optional metadata)
PromptContributor.estimateTokens(counter)
PromptContributor.promptContribution(counter)

Why `TokenCounter<T>` is generic

Token counting is not a "string problem." It's a modality problem.

Different inputs -> different counting rules -> different providers -> different math.

Examples

TokenCounter<String>
- Plain text
- Heuristic or BPE (tiktoken, etc.)
- What this PR implements
TokenCounter<Message> (future)
- Includes role framing, separators, reply priming (OpenAI adds 3 tokens per message + 3 reply-priming tokens, and this has changed between model versions)
- Lives where Message exists (avoids module coupling)
TokenCounter<AgentImage>
- Based on resolution / tiling, not content analysis
- OpenAI: 85 + 170 * ceil(w/512) * ceil(h/512) for high-detail
- Anthropic: (w * h) / 750
- Different providers, different formulas - obviating the need for an SPI
Audio
- Duration-based: 1 token per 100ms input
- Completely different axis

Composition model

Keep it simple. Build up from smaller counters:

val textCounter = TokenCounter.heuristic()

val messageCounter: TokenCounter<Message> = TokenCounter { msg ->
    textCounter.estimateTokens(msg.content) + ROLE_FRAMING_OVERHEAD
}

Prior Art (for context)

A couple of existing approaches land in roughly the same place, and they help frame the boundary here:

LangChain4j
- Experimented with multiple counting APIs (text, messages, tools)
- Ended up removing the ones that didn’t generalize
Spring AI
- Exposes a generic content type (MediaContent)
- Current implementations are still effectively text-only (JTokkitTokenCountEstimator tokenizes strings and does not meaningfully account for non-text modalities)

The common pattern: token counting doesn’t generalize cleanly across modalities.

What's next (options for final PR or follow-ups)

Integration with existing budget infrastructure

The framework already has budget concepts that token estimation could feed into:

Budget in ProcessOptions - tracks cost, actions, and total tokens at the process level. Currently populated after LLM calls. Token estimation could inform this before a call is made.
EarlyTerminationPolicy.maxTokens - terminates processes when token usage exceeds a limit. Pre-flight estimation could make this more proactive.
Thinking.tokenBudget - allocates tokens for model thinking (Anthropic extended thinking). Different concern, but shows the pattern of token-aware resource allocation already exists in the framework.

Other possibilities

Token-budget-aware prompt assembly
Contributor prioritization ("essential vs optional")
Truncation instead of blind dropping
Token-aware conversation windowing
TokenCounter<Message> in embabel-agent-api
Provider-backed counters (tiktoken, API calls, etc.)
PricingModel integration (pre-flight cost checks)
ContextualPromptElement adoption

Design Notes

Estimation is inherently approximate - even Anthropic's server-side token counting API calls its results an "estimate"
Different providers will disagree
Some providers expose authoritative APIs (we can plug those in later)

This SPI doesn't try to solve correctness globally. It gives us a place to plug in better answers over time.

Open Questions

Is TokenCounter<T> the right abstraction, or should we constrain to text for now?
Is embabel-agent-ai the right module boundary?
Where should token budgets ultimately live?
- model metadata?
- process options?
- app config?
Do we extend PromptContributor, or introduce a separate budgeting layer?
Should ContextualPromptElement adopt this?

Test Plan

TokenCounterTest - contract, factory, SAM conversion, generic type parameter
CharacterHeuristicTokenCounterTest - heuristic behavior, configurable ratio, ceiling division
TokenCounterJavaUsageTest - Java interop (factory, lambda, null rejection)
PromptContributorTest.TokenEstimation - estimateTokens, override consistency, promptContribution with counter

Introduce a generic TokenCounter<T> fun interface for estimating token counts across different content types. Phase 1 provides the SPI shape and a character-heuristic default for design feedback. - TokenCounter<T> fun interface in com.embabel.common.ai.model - CharacterHeuristicTokenCounter with configurable charsPerToken ratio - PromptContribution.estimatedTokens field - PromptContributor.estimateTokens() and promptContribution(counter) - Kotlin, Java interop, and generic type parameter tests - All new types marked @ApiStatus.Experimental Signed-off-by: James Dunnam <7660553+jimador@users.noreply.github.com>

Rename the SPI method to better reflect that token counting is an estimation. Use ceiling division in CharacterHeuristicTokenCounter so short inputs are never underestimated. Route promptContribution(counter) through estimateTokens so overrides are respected consistently. Signed-off-by: James Dunnam <7660553+jimador@users.noreply.github.com>

simeshev · 2026-03-25T01:28:15Z

A couple of thoughts:

Call efficiency: The default implementation of estimateTokens and promptContribution(counter) in the PromptContributor calls the contribution() method. If a contributor performs expensive operations (like a DB query or heavy computation) inside contribution() without caching, this can double the execution cost during assembly.
Scope: The PR is not calling out that this is strictly the measurement "Phase 1".

jimador · 2026-03-25T03:27:53Z

@simeshev good catch on the double contribution() call. I agree the PromptContributor integration is awkward — having callers pass a TokenCounter into estimateTokens feels off. A contributor producing content shouldn’t need a caller to hand it infrastructure just to describe its own output.

I originally thought PromptContributor was the right place for this since it’s how components inject content into prompts, but looking back, it doesn’t feel like the right integration point.

A capability-style approach on contributors might still make sense for cases that can report their own size, but the real value from token counting seems to show up at the assembly layer — what I was thinking of as “Phase 2”:

Token-budget-aware prompt assembly
Contributor prioritization / “essential vs optional”
Truncation instead of blind dropping
Token-aware conversation windowing
Provider-backed counters (tiktoken, API calls, etc.)
Pricing integration (pre-flight cost checks)

I intentionally didn’t start there — I wanted to first get a gauge on whether a token counting API itself is something worth bringing into the framework before going deep on those pieces.

simeshev · 2026-03-25T04:29:46Z

I'm wondering if doing rough / provisional / discarable Phase 2 could serve as pressure test for Phase 1.
By the way, we already have cost infra in place. Here is the starting point:

/**
 * Annotates a method that computes the dynamic cost or value of an action at planning time.
 * Similar to @Condition, this method can take domain object parameters from the blackboard.
 * **Unlike @Condition, all domain object parameters must be nullable.**
 * If a parameter is not available on the blackboard, null will be passed.
 *
 * The method can also take a `Blackboard` parameter for direct access to all available objects.
 *
 * The method must return a Double between 0.0 and 1.0.
 *
 * Example:
 * ```java
 * @Cost(name = "processingCost")
 * public double computeProcessingCost(@Nullable LargeDataSet largeData) {
 *     return largeData != null ? 0.9 : 0.1;
 * }
 *
 * @Action(costMethod = "processingCost")
 * public DataOutput processData(DataInput input) { ... }
 * ```
 *
 * @param name Name of the cost method. Referenced by @Action.costMethod or @Action.valueMethod.
 * If not provided, the name will be the method name.
 */
@Target(AnnotationTarget.FUNCTION)
@Retention(AnnotationRetention.RUNTIME)
@MustBeDocumented
annotation class Cost(
    val name: String = "",
)

and here is the budget starting point:

/**
 * Budget for an agent process.
 * @param cost the cost of running the process, in USD.
 * @param actions the maximum number of actions the agent can perform before termination.
 * @param tokens the maximum number of tokens the agent can use before termination. This can be useful in the case of
 * local models where the cost is not directly measurable, but we don't want excessive work.
 */
data class Budget @JvmOverloads constructor(
    val cost: Double = DEFAULT_COST_LIMIT,
    val actions: Int = DEFAULT_ACTION_LIMIT,
    val tokens: Int = DEFAULT_TOKEN_LIMIT,
) {

    fun earlyTerminationPolicy(): EarlyTerminationPolicy {
        return EarlyTerminationPolicy.firstOf(
            EarlyTerminationPolicy.maxActions(maxActions = actions),
            EarlyTerminationPolicy.maxTokens(maxTokens = tokens),
            EarlyTerminationPolicy.hardBudgetLimit(budget = cost),
        )
    }

    fun withCost(cost: Double): Budget =
        this.copy(cost = cost)

    fun withActions(actions: Int): Budget =
        this.copy(actions = actions)

    fun withTokens(tokens: Int): Budget =
        this.copy(tokens = tokens)

    companion object {

        const val DEFAULT_COST_LIMIT = 2.0

        /**
         * Default maximum number of actions an agent process can perform before termination.
         */
        const val DEFAULT_ACTION_LIMIT = 50

        const val DEFAULT_TOKEN_LIMIT = 1000000

        @JvmField
        val DEFAULT = Budget()

    }

}

…ributor Per PR feedback, token estimation on individual contributors creates an awkward API where callers pass infrastructure (TokenCounter) to content producers. Token estimation belongs at the assembly layer. Removes: - estimateTokens(counter: TokenCounter<String>): Int - promptContribution(counter: TokenCounter<String>): PromptContribution - Associated TokenEstimation test class Keeps TokenCounter SPI, CharacterHeuristicTokenCounter, and PromptContribution.estimatedTokens field for Phase 2 integration. Signed-off-by: James Dunnam <7660553+jimador@users.noreply.github.com>

simeshev · 2026-03-26T16:16:46Z

Does this change stand on its own? If we make it a part of the public API - how would Embabel users use it?

alexheifetz · 2026-03-27T14:25:17Z

@jimador PR is in Draft state, I assume it's not ready for review / merge yet, is this correct?

jimador added 2 commits March 24, 2026 12:49

jimador mentioned this pull request Mar 24, 2026

Token estimation SPI for prompt assembly #1497

Open

alexheifetz added waiting-for-clarification waiting-for-assignment labels Mar 27, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feature: TokenCounter SPI #1536

feature: TokenCounter SPI #1536
jimador wants to merge 3 commits intoembabel:mainfrom
jimador:feature/token-counter-spi

jimador commented Mar 24, 2026 •

edited

Loading

Uh oh!

simeshev commented Mar 25, 2026

Uh oh!

jimador commented Mar 25, 2026

Uh oh!

simeshev commented Mar 25, 2026

Uh oh!

simeshev commented Mar 26, 2026

Uh oh!

alexheifetz commented Mar 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

jimador commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What's in this PR

Why TokenCounter<T> is generic

Examples

Composition model

Prior Art (for context)

What's next (options for final PR or follow-ups)

Integration with existing budget infrastructure

Other possibilities

Design Notes

Open Questions

Test Plan

Uh oh!

simeshev commented Mar 25, 2026

Uh oh!

jimador commented Mar 25, 2026

Uh oh!

simeshev commented Mar 25, 2026

Uh oh!

simeshev commented Mar 26, 2026

Uh oh!

alexheifetz commented Mar 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jimador commented Mar 24, 2026 •

edited

Loading

Why `TokenCounter<T>` is generic