feat(client): add hard compaction option preserving system segment#2522
feat(client): add hard compaction option preserving system segment#2522HUQIANTAO wants to merge 1 commit into
Conversation
There was a problem hiding this comment.
HUQIANTAO has reached the 50-review limit for trial accounts. To continue receiving code reviews, upgrade your plan.
There was a problem hiding this comment.
Code Review
This pull request introduces a hard compaction mode to crates/tui/src/compaction.rs that replaces the middle conversation history with a single summary message while preserving the system prompt and recent messages. The review feedback identifies three issues: first, plan_hard_compaction can split tool calls and their corresponding results, which violates API contracts and causes errors; second, generating a summary_prompt in compact_hard_safe defeats prefix cache stability and causes redundancy; and third, the workspace parameter is unused and should be prefixed with an underscore to avoid compiler warnings.
| pub fn plan_hard_compaction( | ||
| messages: &[Message], | ||
| keep_recent: usize, | ||
| ) -> Option<HardCompactionPlan> { | ||
| let len = messages.len(); | ||
| if len < keep_recent + MIN_SUMMARIZE_MESSAGES { | ||
| return None; | ||
| } | ||
|
|
||
| let summarize_end = len.saturating_sub(keep_recent); | ||
| let summarize_indices: Vec<usize> = (0..summarize_end).collect(); | ||
| let recent_indices: Vec<usize> = (summarize_end..len).collect(); | ||
|
|
||
| Some(HardCompactionPlan { | ||
| summarize_indices, | ||
| recent_indices, | ||
| }) | ||
| } |
There was a problem hiding this comment.
The current implementation of plan_hard_compaction splits the message history at an arbitrary index (len - keep_recent). This can easily split a tool call and its corresponding tool result (e.g., the tool call is summarized, but the tool result is kept in the recent tail, or vice versa).
DeepSeek and OpenAI APIs strictly require that any tool role message must be preceded by an assistant message containing the matching tool_calls ID. If a tool result is orphaned in the active messages array, the API will reject the request with a 400 error.
To prevent this, we should adjust the summarize_end boundary backwards if any message at or after summarize_end is a ToolResult whose corresponding ToolUse is before summarize_end.
| let summary_prompt = if config.cache_summary { | ||
| Some(SystemPrompt::Text(format!( | ||
| "Previous conversation summary:\n{summary_text}" | ||
| ))) | ||
| } else { | ||
| None | ||
| }; | ||
|
|
||
| return Ok(HardCompactionResult { | ||
| messages: new_messages, | ||
| summary_prompt, | ||
| removed_messages: plan.summarize_indices, | ||
| }); |
There was a problem hiding this comment.
In compact_hard_safe, if config.cache_summary is enabled, a summary_prompt is generated and returned in HardCompactionResult.
If the caller merges this summary_prompt into the system prompt (using merge_system_prompts), the system prompt will be modified. This completely defeats the primary design goal of hard compaction, which is to keep the system prompt byte-identical so that the prefix cache remains hot.
Furthermore, since the summary is already injected directly into the messages array as summary_msg (role: user), adding it to the system prompt is redundant and will cause the model to see the summary twice.
We should set summary_prompt to None for hard compaction to guarantee prefix cache stability.
return Ok(HardCompactionResult {
messages: new_messages,
summary_prompt: None,
removed_messages: plan.summarize_indices,
});| pub async fn compact_hard_safe( | ||
| client: &DeepSeekClient, | ||
| messages: &[Message], | ||
| config: &CompactionConfig, | ||
| workspace: Option<&Path>, | ||
| ) -> Result<HardCompactionResult> { |
There was a problem hiding this comment.
The workspace parameter is passed to compact_hard_safe but is never used within the function body. This will trigger an unused_variables compiler warning in strict builds.
If this parameter is kept for signature compatibility with compact_messages_safe, please prefix it with an underscore (_workspace).
| pub async fn compact_hard_safe( | |
| client: &DeepSeekClient, | |
| messages: &[Message], | |
| config: &CompactionConfig, | |
| workspace: Option<&Path>, | |
| ) -> Result<HardCompactionResult> { | |
| pub async fn compact_hard_safe( | |
| client: &DeepSeekClient, | |
| messages: &[Message], | |
| config: &CompactionConfig, | |
| _workspace: Option<&Path>, | |
| ) -> Result<HardCompactionResult> { |
75a02bf to
be5745d
Compare
There was a problem hiding this comment.
HUQIANTAO has reached the 50-review limit for trial accounts. To continue receiving code reviews, upgrade your plan.
Add hard compaction mode that replaces the middle conversation history with a single summary message while preserving the system prompt segment and the last N recent messages (default 8). This is opt-in via the hard_enabled configuration flag. Hard compaction differs from the existing soft-seam approach: - Soft seams append <archived_context> blocks without removing messages - Hard compaction replaces the middle messages with a summary Both approaches preserve prefix cache stability because the system prompt segment (stored separately in session.system_prompt) is never touched. The summary message replaces the middle history, and the recent tail is preserved verbatim. Why hard compaction matters: When the context window is under pressure, the existing soft-seam approach keeps growing the message array (appending summary blocks without removing old messages). Hard compaction is more aggressive: it replaces the middle history entirely, freeing more budget. The trade-off is that old messages are lost (they're summarized), but the system prompt segment remains byte-identical, so the prefix cache stays hot after compaction. Changes: - Add HardCompactionPlan and HardCompactionResult types - Add plan_hard_compaction() function that plans which messages to summarize and which to keep - Add compact_hard_safe() async function that executes the hard compaction using the existing LLM summarization infrastructure - Add hard_enabled and hard_keep_recent fields to CompactionConfig (defaults: false and 8) - Add 3 unit tests for the new functionality
be5745d to
7321528
Compare
There was a problem hiding this comment.
HUQIANTAO has reached the 50-review limit for trial accounts. To continue receiving code reviews, upgrade your plan.
Summary
Add a hard compaction mode that replaces the middle conversation history with a single summary message while preserving the system prompt segment and the last N recent messages (default 8). This is opt-in via the
hard_enabledconfiguration flag.Background
The existing compaction system uses a soft-seam approach: it appends
<archived_context>summary blocks to the message array without removing old messages. This preserves all verbatim history but keeps growing the message array. When the context window is under pressure, the soft-seam approach may not free enough budget.Hard compaction is more aggressive: it replaces the middle history entirely with a summary message. The trade-off is that old messages are lost (they're summarized), but the system prompt segment remains byte-identical, so the prefix cache stays hot after compaction.
Design
Message structure after hard compaction
The system prompt (stored separately in
session.system_prompt) is never touched. The summary message replaces the middle history, and the recent tail is preserved verbatim.Configuration
Two new fields in
CompactionConfig:hard_enabled: bool(default:false) — opt-in flaghard_keep_recent: usize(default:8) — number of recent messages to preserveAPI
plan_hard_compaction(messages, keep_recent) -> Option<HardCompactionPlan>: Plan which messages to summarize and which to keep.compact_hard_safe(client, messages, config, workspace) -> Result<HardCompactionResult>: Execute the hard compaction using the existing LLM summarization infrastructure.Summary format
The summary is wrapped in
<hard_compaction_summary>tags so the model can distinguish it from verbatim history:Why this matters
Hard compaction is the highest-leverage tool for freeing context budget when the window is under pressure. Unlike soft seams (which keep growing the message array), hard compaction replaces the middle history entirely. The system prompt segment (which is the largest stable block) is never touched, so the prefix cache remains hot after compaction.
This is especially valuable for long sessions where the context window is exhausted: instead of starting a new cycle (which resets the prefix cache), hard compaction can free enough budget to continue the current cycle with the prefix cache intact.
Testing
3 new unit tests:
plan_hard_compaction_returns_none_when_too_few_messages: Verify graceful handling of insufficient messages.plan_hard_compaction_preserves_recent_tail: Verify the recent tail is preserved correctly.hard_compaction_config_defaults: Verify the default configuration values.All 59 compaction tests pass.
Files changed
crates/tui/src/compaction.rs: Add hard compaction types, functions, and tests (223 lines added)