RFC: Add pkg/tokens — in-process token counter helper (tiktoken-go wrapper)

### Problem

LocalAI has no central Go-side helper for counting tokens. Today:

- `core/backend/tokenize.go:13` (`ModelTokenize`) runs via gRPC backend
  round-trip — too heavy for middleware that must decide *before*
  dispatch.
- `tiktoken-go v0.1.7` is already a transitive dependency (go.mod:413)
  but unused.

### Proposal

Small utility package at `pkg/tokens` (or `internal/tokens` if
preferred) with three stateless functions:

```go
// Count returns the token count for messages under model's encoding.
func Count(messages []schema.Message, model string) (int, error)

// CountText returns tokens for a string under model's encoding.
func CountText(text, model string) (int, error)

// EncodingFor returns the tiktoken encoding name for model,
// with cl100k_base fallback for unknowns.
func EncodingFor(model string) string
```

No state, no config, no gRPC. Just wraps `tiktoken-go`.

### Motivation

Several upcoming middleware features need fast in-process token
counting:

- **Compression middleware (RFC #9534)** — trigger on token-count ratio
- Future chain-routing / cost-prediction / Prometheus-before-dispatch

Moving `tiktoken-go` from indirect to direct dep + adding a narrow
helper lets these build on a stable primitive.

### Non-goals

- Not exact for every LLM. `tiktoken` is OpenAI-family; for Qwen/Llama
  it's ~5% approximation. GoDoc explicit on that. For compression
  threshold triggering, approximation is fine.
- No chat-template special-token accounting (belongs in backend).
- No streaming-incremental counter.

### Backward compatibility

Pure addition. New package + promote one indirect dep to direct.

### Implementation outline

```
pkg/tokens/
  encoding.go    # model → encoding alias table with cl100k_base fallback
  count.go       # Count, CountText, EncodingFor
  count_test.go  # Ginkgo v2
```

### Open questions for mudler

1. **Package placement** — `pkg/tokens/` (public, importable by
   third-party if they vendor LocalAI) vs `internal/tokens/` (private)?

2. **Model-to-encoding alias table** — how aggressive with pre-populated
   aliases? Conservative (`gpt-4*`, `gpt-3.5*` only + fallback) or also
   populate common Qwen/Llama variants? The latter risks drift; the
   former is lean. Preference?

3. **Public API shape** — is `Count(messages, model) (int, error)` the
   right surface, or would you rather see e.g. a `Counter` interface
   with `NewCounter(model)` + `Count(msgs)` for cacheable state?

### Prior art

walcz.de uses a Python equivalent in its `prompt-optimizer` proxy. The
Go version would be the missing primitive needed for RFC #9534 to
build on without shipping a throwaway inline utility.

### Next step

If the scope lands, PR:

- `pkg/tokens/*.go` + Ginkgo tests (~250 LOC incl. tests)
- `go.mod` promote tiktoken-go to direct

Approx 0.5-1 day.

Assisted-by: Claude:claude-opus-4-7


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

RFC: Add pkg/tokens — in-process token counter helper (tiktoken-go wrapper) #9537

Problem

Proposal

Motivation

Non-goals

Backward compatibility

Implementation outline

Open questions for mudler

Prior art

Next step

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

RFC: Add pkg/tokens — in-process token counter helper (tiktoken-go wrapper) #9537

Description

Problem

Proposal

Motivation

Non-goals

Backward compatibility

Implementation outline

Open questions for mudler

Prior art

Next step

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions