SPEC-25: Context-as-Variable Enforcement

Prevent context rot via externalized context variables

Status: Implemented in rlm-core runtime (context externalization contract + explicit size-policy/auto-chunk APIs) Created: 2026-01-20 Epic: loop-zcx (DSPy-Inspired RLM Improvements) Task: loop-bw2

Overview

Enforce the context-as-variable pattern where the root LLM receives only the query while full context is stored as Python variables in the REPL. This prevents "context rot" where LLM performance degrades with lengthy context in prompts.

Implementation Snapshot (2026-02-20)

Section	Status	Runtime Evidence
SPEC-25.01 Context externalization	Implemented	`ExternalizedContext::{from_session,from_session_with_config}` and variable typing in `rlm-core/src/context/externalize.rs`
SPEC-25.02 Root prompt generation contract	Implemented	`ExternalizedContext::root_prompt_with_config` includes helper guidance + explicit `SUBMIT({...})` semantics; coverage in `test_root_prompt_generation` and `test_root_prompt_omits_full_context_content`
SPEC-25.03 Variable access helpers	Implemented	Active helpers in `rlm-core/python/rlm_repl/helpers.py` (`peek`, `search`, `summarize`, `find_relevant`) with test coverage in `rlm-core/python/tests/test_repl.py`
SPEC-25.04 Size tracking and chunking warnings	Implemented	`SizeConfig`, `SizeWarning`, `ExternalizedContext::{check_size_limits,auto_chunk}`, and `ContextVariable::new_with_config` in `rlm-core/src/context/externalize.rs`

Background

From Codecrack3 RLM-DSPy:

Direct API calls fail at 0% accuracy on 132k-token tasks
RLM with externalized context achieves 80% accuracy
Token consumption: ~2-3k tokens vs 95k+ for direct
Context exploration via programmatic access (slicing, regex, recursive calls)

Requirements

SPEC-25.01: Context Externalization

Structure for externalized context.

/// Externalized context for RLM execution
#[derive(Debug, Clone)]
pub struct ExternalizedContext {
    /// Query text (sent to LLM in prompt)
    pub query: String,
    /// Variables available in REPL (NOT sent in prompt)
    pub variables: HashMap<String, ContextVariable>,
    /// Total size of externalized data
    pub total_size_bytes: usize,
}

/// A single context variable
#[derive(Debug, Clone)]
pub struct ContextVariable {
    /// Variable name (Python identifier)
    pub name: String,
    /// Type of context
    pub var_type: ContextVarType,
    /// Size in bytes
    pub size_bytes: usize,
    /// Brief summary for LLM (what this variable contains)
    pub summary: String,
    /// Number of items (for collections)
    pub item_count: Option<usize>,
}

/// Types of context variables
#[derive(Debug, Clone)]
pub enum ContextVarType {
    /// Conversation history: List[Message]
    Conversation,
    /// File contents: Dict[str, str]
    Files,
    /// Tool outputs: List[ToolOutput]
    ToolOutputs,
    /// Working memory: Dict[str, Any]
    WorkingMemory,
    /// Custom context
    Custom(String),
}

impl ExternalizedContext {
    /// Create from SessionContext
    pub fn from_session(ctx: &SessionContext, query: &str) -> Self;

    /// Get variable summaries for prompt
    pub fn variable_summaries(&self) -> String;

    /// Check if context exceeds size limits
    pub fn check_size_limits(&self, config: &SizeConfig) -> Vec<SizeWarning>;
}

Acceptance Criteria:

All context types externalized
Summaries generated for each variable
Size tracking accurate

SPEC-25.02: Root Prompt Generation

Generate prompts without full context.

impl Orchestrator {
    /// Generate root prompt with externalized context
    fn generate_root_prompt(
        &self,
        query: &str,
        external: &ExternalizedContext,
    ) -> String {
        format!(r#"
You have access to the following context variables in the REPL:

{variable_summaries}

To explore the context, use Python code in the REPL. Available helpers:
- peek(var, start, end) - Get slice of collection
- search(var, pattern) - Search for pattern (regex supported)
- summarize(var) - Get LLM summary of variable
- len(var) - Get size of collection

Your task: {query}

Write Python code to explore the context and find the answer.
When done, call SUBMIT({{...}}) with your outputs.
"#,
            variable_summaries = external.variable_summaries(),
            query = query
        )
    }
}

Prompt Rules:

Root prompt MUST NOT include full context
Root prompt MUST include variable summaries
Root prompt MUST list available helpers
Root prompt MUST instruct REPL exploration

Acceptance Criteria:

Prompt contains summaries, not full content
Helper functions documented in prompt
SUBMIT instruction included

SPEC-25.03: Variable Access Helpers

REPL helper functions for context access.

# Available in REPL sandbox

def peek(var, start: int = 0, end: int = 10):
    """
    Get slice of a collection.

    Args:
        var: Collection to slice (list, dict values, string)
        start: Start index (default 0)
        end: End index (default 10)

    Returns:
        Sliced content with metadata
    """
    if isinstance(var, list):
        return var[start:end]
    elif isinstance(var, dict):
        keys = list(var.keys())[start:end]
        return {k: var[k] for k in keys}
    elif isinstance(var, str):
        return var[start:end]
    else:
        raise TypeError(f"Cannot peek into {type(var)}")


def search(var, pattern: str, regex: bool = False, max_results: int = 10):
    """
    Search for pattern in context.

    Args:
        var: Context to search (dict of files, list of messages, etc.)
        pattern: Search pattern (string or regex)
        regex: Whether pattern is regex (default False)
        max_results: Maximum results to return (default 10)

    Returns:
        List of matches with location info
    """
    import re
    if regex:
        pat = re.compile(pattern)
        match_fn = lambda s: pat.search(s) is not None
    else:
        match_fn = lambda s: pattern in s

    results = []
    if isinstance(var, dict):
        for key, value in var.items():
            if match_fn(str(value)):
                results.append({"key": key, "preview": str(value)[:200]})
    elif isinstance(var, list):
        for i, item in enumerate(var):
            if match_fn(str(item)):
                results.append({"index": i, "preview": str(item)[:200]})

    return results[:max_results]


def summarize(var, max_tokens: int = 500) -> str:
    """
    Get LLM summary of variable (deferred operation).

    Args:
        var: Variable to summarize
        max_tokens: Maximum tokens in summary

    Returns:
        Summary string (via deferred LLM call)
    """
    # Returns DeferredOperation, resolved by orchestrator
    return _deferred_llm_call(
        f"Summarize the following in {max_tokens} tokens or less:\n{var}"
    )


def find_relevant(var, query: str, top_k: int = 5):
    """
    Find most relevant items for a query.

    Args:
        var: Collection to search
        query: Query string
        top_k: Number of results

    Returns:
        Top k relevant items (via embedding similarity)
    """
    # Returns DeferredOperation, resolved by orchestrator
    return _deferred_embedding_search(var, query, top_k)

Acceptance Criteria:

peek() works for all collection types
search() supports regex and literal
summarize() returns deferred operation
find_relevant() uses embeddings

SPEC-25.04: Context Size Limits

Size tracking and enforcement.

/// Configuration for context size limits
#[derive(Debug, Clone)]
pub struct SizeConfig {
    /// Warning threshold per variable (bytes)
    pub warn_threshold: usize,      // Default: 100KB
    /// Error threshold per variable (bytes)
    pub chunk_threshold: usize,     // Default: 1MB
    /// Maximum total externalized size (bytes)
    pub max_total_size: usize,      // Default: 10MB
}

impl Default for SizeConfig {
    fn default() -> Self {
        Self {
            warn_threshold: 100 * 1024,        // 100KB
            chunk_threshold: 1024 * 1024,      // 1MB
            max_total_size: 10 * 1024 * 1024,  // 10MB
        }
    }
}

/// Warning for size limit issues
#[derive(Debug, Clone)]
pub enum SizeWarning {
    /// Variable exceeds warning threshold
    LargeVariable {
        name: String,
        size: usize,
        threshold: usize,
    },
    /// Variable requires chunking
    RequiresChunking {
        name: String,
        size: usize,
        suggested_chunks: usize,
    },
    /// Total size exceeds maximum
    TotalSizeExceeded {
        total: usize,
        max: usize,
    },
}

impl ExternalizedContext {
    /// Check size limits and return warnings
    pub fn check_size_limits(&self, config: &SizeConfig) -> Vec<SizeWarning> {
        let mut warnings = Vec::new();

        for (name, var) in &self.variables {
            if var.size_bytes > config.chunk_threshold {
                warnings.push(SizeWarning::RequiresChunking {
                    name: name.clone(),
                    size: var.size_bytes,
                    suggested_chunks: (var.size_bytes / config.warn_threshold) + 1,
                });
            } else if var.size_bytes > config.warn_threshold {
                warnings.push(SizeWarning::LargeVariable {
                    name: name.clone(),
                    size: var.size_bytes,
                    threshold: config.warn_threshold,
                });
            }
        }

        if self.total_size_bytes > config.max_total_size {
            warnings.push(SizeWarning::TotalSizeExceeded {
                total: self.total_size_bytes,
                max: config.max_total_size,
            });
        }

        warnings
    }

    /// Auto-chunk large variables
    pub fn auto_chunk(&mut self, config: &SizeConfig);
}

Acceptance Criteria:

Warnings generated for large variables
Chunking suggested when needed
Total size tracked

Performance Impact

Scenario	Direct API	Externalized	Improvement
132k tokens	~95k prompt	~2-3k prompt	97% reduction
60k structured	0% accuracy	80% accuracy	N/A (enables)
150k+ tokens	Fails	Works	N/A (enables)

Test Plan

Test	Description	Spec
`test_externalized_context_from_session`	Externalize `SessionContext` into REPL variable metadata	SPEC-25.01
`test_root_prompt_generation`	Prompt includes summaries/helper guidance and submit contract	SPEC-25.02
`test_root_prompt_omits_full_context_content`	Prompt omits raw file body content	SPEC-25.02
`test_peek_list`	`peek()` behavior on list input	SPEC-25.03
`test_search_string`	`search()` literal matching	SPEC-25.03
`test_search_regex`	`search()` regex matching	SPEC-25.03
`test_summarize_returns_deferred`	`summarize()` returns deferred operation	SPEC-25.03
`test_find_relevant_returns_embed_operation`	`find_relevant()` emits embedding/deferred operation	SPEC-25.03
`test_size_tracker`	Warning generation and threshold tracking	SPEC-25.04
`test_context_variable_requires_chunking`	Chunking threshold flag behavior	SPEC-25.04
`test_check_size_limits_with_explicit_config`	Deterministic size warning generation with explicit threshold config	SPEC-25.04
`test_auto_chunk_marks_variable_and_clears_chunking_requirement`	Deterministic `auto_chunk` behavior and chunk metadata updates	SPEC-25.04

References

Codecrack3 RLM-DSPy
RLM Paper - Prompts as manipulable objects

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SPEC-25: Context-as-Variable Enforcement

Overview

Implementation Snapshot (2026-02-20)

Background

Requirements

SPEC-25.01: Context Externalization

SPEC-25.02: Root Prompt Generation

SPEC-25.03: Variable Access Helpers

SPEC-25.04: Context Size Limits

Performance Impact

Test Plan

References

FilesExpand file tree

SPEC-25-context-externalization.md

Latest commit

History

SPEC-25-context-externalization.md

File metadata and controls

SPEC-25: Context-as-Variable Enforcement

Overview

Implementation Snapshot (2026-02-20)

Background

Requirements

SPEC-25.01: Context Externalization

SPEC-25.02: Root Prompt Generation

SPEC-25.03: Variable Access Helpers

SPEC-25.04: Context Size Limits

Performance Impact

Test Plan

References