Skip to content

feature: TokenCounter SPI #1536

Draft
jimador wants to merge 3 commits intoembabel:mainfrom
jimador:feature/token-counter-spi
Draft

feature: TokenCounter SPI #1536
jimador wants to merge 3 commits intoembabel:mainfrom
jimador:feature/token-counter-spi

Conversation

@jimador
Copy link

@jimador jimador commented Mar 24, 2026

Summary

This introduces a TokenCounter<T> SPI for making token counts visible when needed (e.g., prompt assembly, cost estimation, context window management, and budget enforcement). ( Token estimation SPI for prompt assembly #1497 )

Today, PromptContributor can inject content, but there's no way to know what that content costs in tokens before this change. That becomes a problem as soon as you have:

  • Multiple contributors competing for a finite context window (RAG results, conversation history, system prompts)
  • Variable-length content where size isn't predictable at build time
  • WindowingConversationFormatter making truncation decisions by message count instead of token cost
  • A need to estimate cost before making an LLM call (we have PricingModel.costOf(inputTokens, outputTokens) but no way to get inputTokens from contributions)

This PR adds the measurement primitives and plumbing to surface token estimates on prompt contributions. It does not attempt budgeting, truncation, prioritization, or windowing yet.

All new types are marked @ApiStatus.Experimental.


What's in this PR

  • TokenCounter<T> (fun interface)
  • TokenCounter.heuristic() -> default implementation
  • CharacterHeuristicTokenCounter (configurable charsPerToken, default = 4, ceiling division)
  • PromptContribution.estimatedTokens (optional metadata)
  • PromptContributor.estimateTokens(counter)
  • PromptContributor.promptContribution(counter)

Why TokenCounter<T> is generic

Token counting is not a "string problem." It's a modality problem.

Different inputs -> different counting rules -> different providers -> different math.

Examples

  • TokenCounter<String>

    • Plain text
    • Heuristic or BPE (tiktoken, etc.)
    • What this PR implements
  • TokenCounter<Message> (future)

    • Includes role framing, separators, reply priming (OpenAI adds 3 tokens per message + 3 reply-priming tokens, and this has changed between model versions)
    • Lives where Message exists (avoids module coupling)
  • TokenCounter<AgentImage>

    • Based on resolution / tiling, not content analysis
    • OpenAI: 85 + 170 * ceil(w/512) * ceil(h/512) for high-detail
    • Anthropic: (w * h) / 750
    • Different providers, different formulas - obviating the need for an SPI
  • Audio


Composition model

Keep it simple. Build up from smaller counters:

val textCounter = TokenCounter.heuristic()

val messageCounter: TokenCounter<Message> = TokenCounter { msg ->
    textCounter.estimateTokens(msg.content) + ROLE_FRAMING_OVERHEAD
}

Prior Art (for context)

A couple of existing approaches land in roughly the same place, and they help frame the boundary here:

  • LangChain4j

    • Experimented with multiple counting APIs (text, messages, tools)
    • Ended up removing the ones that didn’t generalize
  • Spring AI

    • Exposes a generic content type (MediaContent)
    • Current implementations are still effectively text-only (JTokkitTokenCountEstimator tokenizes strings and does not meaningfully account for non-text modalities)

The common pattern: token counting doesn’t generalize cleanly across modalities.


What's next (options for final PR or follow-ups)

Integration with existing budget infrastructure

The framework already has budget concepts that token estimation could feed into:

  • Budget in ProcessOptions - tracks cost, actions, and total tokens at the process level. Currently populated after LLM calls. Token estimation could inform this before a call is made.
  • EarlyTerminationPolicy.maxTokens - terminates processes when token usage exceeds a limit. Pre-flight estimation could make this more proactive.
  • Thinking.tokenBudget - allocates tokens for model thinking (Anthropic extended thinking). Different concern, but shows the pattern of token-aware resource allocation already exists in the framework.

Other possibilities

  • Token-budget-aware prompt assembly
  • Contributor prioritization ("essential vs optional")
  • Truncation instead of blind dropping
  • Token-aware conversation windowing
  • TokenCounter<Message> in embabel-agent-api
  • Provider-backed counters (tiktoken, API calls, etc.)
  • PricingModel integration (pre-flight cost checks)
  • ContextualPromptElement adoption

Design Notes

  • Estimation is inherently approximate - even Anthropic's server-side token counting API calls its results an "estimate"
  • Different providers will disagree
  • Some providers expose authoritative APIs (we can plug those in later)

This SPI doesn't try to solve correctness globally. It gives us a place to plug in better answers over time.


Open Questions

  • Is TokenCounter<T> the right abstraction, or should we constrain to text for now?
  • Is embabel-agent-ai the right module boundary?
  • Where should token budgets ultimately live?
    • model metadata?
    • process options?
    • app config?
  • Do we extend PromptContributor, or introduce a separate budgeting layer?
  • Should ContextualPromptElement adopt this?

Test Plan

  • TokenCounterTest - contract, factory, SAM conversion, generic type parameter
  • CharacterHeuristicTokenCounterTest - heuristic behavior, configurable ratio, ceiling division
  • TokenCounterJavaUsageTest - Java interop (factory, lambda, null rejection)
  • PromptContributorTest.TokenEstimation - estimateTokens, override consistency, promptContribution with counter

jimador added 2 commits March 24, 2026 12:49
Introduce a generic TokenCounter<T> fun interface for estimating token
counts across different content types. Phase 1 provides the SPI shape
and a character-heuristic default for design feedback.

- TokenCounter<T> fun interface in com.embabel.common.ai.model
- CharacterHeuristicTokenCounter with configurable charsPerToken ratio
- PromptContribution.estimatedTokens field
- PromptContributor.estimateTokens() and promptContribution(counter)
- Kotlin, Java interop, and generic type parameter tests
- All new types marked @ApiStatus.Experimental

Signed-off-by: James Dunnam <7660553+jimador@users.noreply.github.com>
Rename the SPI method to better reflect that token counting is an
estimation. Use ceiling division in CharacterHeuristicTokenCounter
so short inputs are never underestimated. Route
promptContribution(counter) through estimateTokens so overrides
are respected consistently.

Signed-off-by: James Dunnam <7660553+jimador@users.noreply.github.com>
@simeshev
Copy link
Collaborator

A couple of thoughts:

  • Call efficiency: The default implementation of estimateTokens and promptContribution(counter) in the PromptContributor calls the contribution() method. If a contributor performs expensive operations (like a DB query or heavy computation) inside contribution() without caching, this can double the execution cost during assembly.
  • Scope: The PR is not calling out that this is strictly the measurement "Phase 1".

@jimador
Copy link
Author

jimador commented Mar 25, 2026

@simeshev good catch on the double contribution() call. I agree the PromptContributor integration is awkward — having callers pass a TokenCounter into estimateTokens feels off. A contributor producing content shouldn’t need a caller to hand it infrastructure just to describe its own output.

I originally thought PromptContributor was the right place for this since it’s how components inject content into prompts, but looking back, it doesn’t feel like the right integration point.

A capability-style approach on contributors might still make sense for cases that can report their own size, but the real value from token counting seems to show up at the assembly layer — what I was thinking of as “Phase 2”:

  • Token-budget-aware prompt assembly
  • Contributor prioritization / “essential vs optional”
  • Truncation instead of blind dropping
  • Token-aware conversation windowing
  • Provider-backed counters (tiktoken, API calls, etc.)
  • Pricing integration (pre-flight cost checks)

I intentionally didn’t start there — I wanted to first get a gauge on whether a token counting API itself is something worth bringing into the framework before going deep on those pieces.

@simeshev
Copy link
Collaborator

  1. I'm wondering if doing rough / provisional / discarable Phase 2 could serve as pressure test for Phase 1.

  2. By the way, we already have cost infra in place. Here is the starting point:

/**
 * Annotates a method that computes the dynamic cost or value of an action at planning time.
 * Similar to @Condition, this method can take domain object parameters from the blackboard.
 * **Unlike @Condition, all domain object parameters must be nullable.**
 * If a parameter is not available on the blackboard, null will be passed.
 *
 * The method can also take a `Blackboard` parameter for direct access to all available objects.
 *
 * The method must return a Double between 0.0 and 1.0.
 *
 * Example:
 * ```java
 * @Cost(name = "processingCost")
 * public double computeProcessingCost(@Nullable LargeDataSet largeData) {
 *     return largeData != null ? 0.9 : 0.1;
 * }
 *
 * @Action(costMethod = "processingCost")
 * public DataOutput processData(DataInput input) { ... }
 * ```
 *
 * @param name Name of the cost method. Referenced by @Action.costMethod or @Action.valueMethod.
 * If not provided, the name will be the method name.
 */
@Target(AnnotationTarget.FUNCTION)
@Retention(AnnotationRetention.RUNTIME)
@MustBeDocumented
annotation class Cost(
    val name: String = "",
)

and here is the budget starting point:

/**
 * Budget for an agent process.
 * @param cost the cost of running the process, in USD.
 * @param actions the maximum number of actions the agent can perform before termination.
 * @param tokens the maximum number of tokens the agent can use before termination. This can be useful in the case of
 * local models where the cost is not directly measurable, but we don't want excessive work.
 */
data class Budget @JvmOverloads constructor(
    val cost: Double = DEFAULT_COST_LIMIT,
    val actions: Int = DEFAULT_ACTION_LIMIT,
    val tokens: Int = DEFAULT_TOKEN_LIMIT,
) {

    fun earlyTerminationPolicy(): EarlyTerminationPolicy {
        return EarlyTerminationPolicy.firstOf(
            EarlyTerminationPolicy.maxActions(maxActions = actions),
            EarlyTerminationPolicy.maxTokens(maxTokens = tokens),
            EarlyTerminationPolicy.hardBudgetLimit(budget = cost),
        )
    }

    fun withCost(cost: Double): Budget =
        this.copy(cost = cost)

    fun withActions(actions: Int): Budget =
        this.copy(actions = actions)

    fun withTokens(tokens: Int): Budget =
        this.copy(tokens = tokens)

    companion object {

        const val DEFAULT_COST_LIMIT = 2.0

        /**
         * Default maximum number of actions an agent process can perform before termination.
         */
        const val DEFAULT_ACTION_LIMIT = 50

        const val DEFAULT_TOKEN_LIMIT = 1000000

        @JvmField
        val DEFAULT = Budget()

    }

}

…ributor

Per PR feedback, token estimation on individual contributors creates an
awkward API where callers pass infrastructure (TokenCounter) to content
producers. Token estimation belongs at the assembly layer.

Removes:
- estimateTokens(counter: TokenCounter<String>): Int
- promptContribution(counter: TokenCounter<String>): PromptContribution
- Associated TokenEstimation test class

Keeps TokenCounter SPI, CharacterHeuristicTokenCounter, and
PromptContribution.estimatedTokens field for Phase 2 integration.

Signed-off-by: James Dunnam <7660553+jimador@users.noreply.github.com>
@simeshev
Copy link
Collaborator

Does this change stand on its own? If we make it a part of the public API - how would Embabel users use it?

@alexheifetz
Copy link
Contributor

@jimador PR is in Draft state, I assume it's not ready for review / merge yet, is this correct?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants