Problem
LocalAI has no central Go-side helper for counting tokens. Today:
core/backend/tokenize.go:13 (ModelTokenize) runs via gRPC backend
round-trip — too heavy for middleware that must decide before
dispatch.
tiktoken-go v0.1.7 is already a transitive dependency (go.mod:413)
but unused.
Proposal
Small utility package at pkg/tokens (or internal/tokens if
preferred) with three stateless functions:
// Count returns the token count for messages under model's encoding.
func Count(messages []schema.Message, model string) (int, error)
// CountText returns tokens for a string under model's encoding.
func CountText(text, model string) (int, error)
// EncodingFor returns the tiktoken encoding name for model,
// with cl100k_base fallback for unknowns.
func EncodingFor(model string) string
No state, no config, no gRPC. Just wraps tiktoken-go.
Motivation
Several upcoming middleware features need fast in-process token
counting:
Moving tiktoken-go from indirect to direct dep + adding a narrow
helper lets these build on a stable primitive.
Non-goals
- Not exact for every LLM.
tiktoken is OpenAI-family; for Qwen/Llama
it's ~5% approximation. GoDoc explicit on that. For compression
threshold triggering, approximation is fine.
- No chat-template special-token accounting (belongs in backend).
- No streaming-incremental counter.
Backward compatibility
Pure addition. New package + promote one indirect dep to direct.
Implementation outline
pkg/tokens/
encoding.go # model → encoding alias table with cl100k_base fallback
count.go # Count, CountText, EncodingFor
count_test.go # Ginkgo v2
Open questions for mudler
-
Package placement — pkg/tokens/ (public, importable by
third-party if they vendor LocalAI) vs internal/tokens/ (private)?
-
Model-to-encoding alias table — how aggressive with pre-populated
aliases? Conservative (gpt-4*, gpt-3.5* only + fallback) or also
populate common Qwen/Llama variants? The latter risks drift; the
former is lean. Preference?
-
Public API shape — is Count(messages, model) (int, error) the
right surface, or would you rather see e.g. a Counter interface
with NewCounter(model) + Count(msgs) for cacheable state?
Prior art
walcz.de uses a Python equivalent in its prompt-optimizer proxy. The
Go version would be the missing primitive needed for RFC #9534 to
build on without shipping a throwaway inline utility.
Next step
If the scope lands, PR:
pkg/tokens/*.go + Ginkgo tests (~250 LOC incl. tests)
go.mod promote tiktoken-go to direct
Approx 0.5-1 day.
Assisted-by: Claude:claude-opus-4-7
Problem
LocalAI has no central Go-side helper for counting tokens. Today:
core/backend/tokenize.go:13(ModelTokenize) runs via gRPC backendround-trip — too heavy for middleware that must decide before
dispatch.
tiktoken-go v0.1.7is already a transitive dependency (go.mod:413)but unused.
Proposal
Small utility package at
pkg/tokens(orinternal/tokensifpreferred) with three stateless functions:
No state, no config, no gRPC. Just wraps
tiktoken-go.Motivation
Several upcoming middleware features need fast in-process token
counting:
Moving
tiktoken-gofrom indirect to direct dep + adding a narrowhelper lets these build on a stable primitive.
Non-goals
tiktokenis OpenAI-family; for Qwen/Llamait's ~5% approximation. GoDoc explicit on that. For compression
threshold triggering, approximation is fine.
Backward compatibility
Pure addition. New package + promote one indirect dep to direct.
Implementation outline
Open questions for mudler
Package placement —
pkg/tokens/(public, importable bythird-party if they vendor LocalAI) vs
internal/tokens/(private)?Model-to-encoding alias table — how aggressive with pre-populated
aliases? Conservative (
gpt-4*,gpt-3.5*only + fallback) or alsopopulate common Qwen/Llama variants? The latter risks drift; the
former is lean. Preference?
Public API shape — is
Count(messages, model) (int, error)theright surface, or would you rather see e.g. a
Counterinterfacewith
NewCounter(model)+Count(msgs)for cacheable state?Prior art
walcz.de uses a Python equivalent in its
prompt-optimizerproxy. TheGo version would be the missing primitive needed for RFC #9534 to
build on without shipping a throwaway inline utility.
Next step
If the scope lands, PR:
pkg/tokens/*.go+ Ginkgo tests (~250 LOC incl. tests)go.modpromote tiktoken-go to directApprox 0.5-1 day.
Assisted-by: Claude:claude-opus-4-7