Skip to content

Smart Trim & Context

SpdrByte edited this page Mar 4, 2026 · 2 revisions

Smart Trim & Context Management

Context management is the most critical part of maintaining a stable AI session. Standard systems simply drop the oldest parts of a conversation when the history gets too long, often causing the AI to "forget" your original goals.

Gemma CLI uses Smart Trim, an embedding-based algorithm that intelligently decides what to keep and what to drop.


🧠 How Smart Trim Works

When your conversation history approaches the context budget (default: 11,000 tokens):

  1. Semantic Scoring: The system uses gemini-embedding-001 to analyze your current query and every turn in the history.
  2. Relevance Ranking: It scores each turn based on how relevant it is to what you are talking about right now.
  3. Intelligent Retention:
    • Locked: The System Prompt is always kept.
    • Recent: The last 4 turns are always kept to maintain immediate flow.
    • Relevant: The most semantically relevant older turns are kept.
    • Dropped: Less relevant middle turns are pruned to save tokens.
  4. System Notice: A special "SYSTEM NOTICE" is injected so the model knows a trim occurred and which turns were preserved.

⚙️ Configuration

You can manage Smart Trim settings via the /settings menu.

1. Toggle Status

Enable or Disable the semantic trimming engine. If disabled, the CLI falls back to Blind Trimming (dropping the oldest turns first).

2. Set Strength

Adjust how aggressive the trimmer is.

Strength Description Retention
1-2 (Conservative) Keeps most history. Minimal token savings. 8 turns
5 (Balanced) Recommended. Good balance of history and cost. 4 turns
9-10 (Maximum) Keeps only the most vital context. High token savings. 1 turn

📊 Monitoring Context

You can monitor your context pressure in real-time via the Status Bar at the bottom of the CLI:

ctx [████░░░░░░] 40% (4400)

  • ██: Represents current token usage relative to the budget.
  • %: Percentage of the active context window filled.
  • Number: Total estimated tokens in history.

💡 Troubleshooting: Context "Hallucinations"

If the model seems to forget a file you shared earlier:

  1. Check the status bar to see if a trim just occurred.
  2. If it did, you may need to /recall that specific information or lower your Smart Trim Strength in /settings.

Next Steps: Learn about Long-Term Memory to see how facts can persist even after a context wipe.

Clone this wiki locally