Skip to content

RFC: Add pkg/tokens — in-process token counter helper (tiktoken-go wrapper) #9537

@walcz-de

Description

@walcz-de

Problem

LocalAI has no central Go-side helper for counting tokens. Today:

  • core/backend/tokenize.go:13 (ModelTokenize) runs via gRPC backend
    round-trip — too heavy for middleware that must decide before
    dispatch.
  • tiktoken-go v0.1.7 is already a transitive dependency (go.mod:413)
    but unused.

Proposal

Small utility package at pkg/tokens (or internal/tokens if
preferred) with three stateless functions:

// Count returns the token count for messages under model's encoding.
func Count(messages []schema.Message, model string) (int, error)

// CountText returns tokens for a string under model's encoding.
func CountText(text, model string) (int, error)

// EncodingFor returns the tiktoken encoding name for model,
// with cl100k_base fallback for unknowns.
func EncodingFor(model string) string

No state, no config, no gRPC. Just wraps tiktoken-go.

Motivation

Several upcoming middleware features need fast in-process token
counting:

Moving tiktoken-go from indirect to direct dep + adding a narrow
helper lets these build on a stable primitive.

Non-goals

  • Not exact for every LLM. tiktoken is OpenAI-family; for Qwen/Llama
    it's ~5% approximation. GoDoc explicit on that. For compression
    threshold triggering, approximation is fine.
  • No chat-template special-token accounting (belongs in backend).
  • No streaming-incremental counter.

Backward compatibility

Pure addition. New package + promote one indirect dep to direct.

Implementation outline

pkg/tokens/
  encoding.go    # model → encoding alias table with cl100k_base fallback
  count.go       # Count, CountText, EncodingFor
  count_test.go  # Ginkgo v2

Open questions for mudler

  1. Package placementpkg/tokens/ (public, importable by
    third-party if they vendor LocalAI) vs internal/tokens/ (private)?

  2. Model-to-encoding alias table — how aggressive with pre-populated
    aliases? Conservative (gpt-4*, gpt-3.5* only + fallback) or also
    populate common Qwen/Llama variants? The latter risks drift; the
    former is lean. Preference?

  3. Public API shape — is Count(messages, model) (int, error) the
    right surface, or would you rather see e.g. a Counter interface
    with NewCounter(model) + Count(msgs) for cacheable state?

Prior art

walcz.de uses a Python equivalent in its prompt-optimizer proxy. The
Go version would be the missing primitive needed for RFC #9534 to
build on without shipping a throwaway inline utility.

Next step

If the scope lands, PR:

  • pkg/tokens/*.go + Ginkgo tests (~250 LOC incl. tests)
  • go.mod promote tiktoken-go to direct

Approx 0.5-1 day.

Assisted-by: Claude:claude-opus-4-7

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions