Skip to content

feat(client): add hard compaction option preserving system segment#2522

Open
HUQIANTAO wants to merge 1 commit into
Hmbown:mainfrom
HUQIANTAO:feat/hard-compact-system-preserve
Open

feat(client): add hard compaction option preserving system segment#2522
HUQIANTAO wants to merge 1 commit into
Hmbown:mainfrom
HUQIANTAO:feat/hard-compact-system-preserve

Conversation

@HUQIANTAO
Copy link
Copy Markdown
Contributor

Summary

Add a hard compaction mode that replaces the middle conversation history with a single summary message while preserving the system prompt segment and the last N recent messages (default 8). This is opt-in via the hard_enabled configuration flag.

Background

The existing compaction system uses a soft-seam approach: it appends <archived_context> summary blocks to the message array without removing old messages. This preserves all verbatim history but keeps growing the message array. When the context window is under pressure, the soft-seam approach may not free enough budget.

Hard compaction is more aggressive: it replaces the middle history entirely with a summary message. The trade-off is that old messages are lost (they're summarized), but the system prompt segment remains byte-identical, so the prefix cache stays hot after compaction.

Design

Message structure after hard compaction

[summary_message, recent_message_1, recent_message_2, ..., recent_message_8]

The system prompt (stored separately in session.system_prompt) is never touched. The summary message replaces the middle history, and the recent tail is preserved verbatim.

Configuration

Two new fields in CompactionConfig:

  • hard_enabled: bool (default: false) — opt-in flag
  • hard_keep_recent: usize (default: 8) — number of recent messages to preserve

API

  • plan_hard_compaction(messages, keep_recent) -> Option<HardCompactionPlan>: Plan which messages to summarize and which to keep.
  • compact_hard_safe(client, messages, config, workspace) -> Result<HardCompactionResult>: Execute the hard compaction using the existing LLM summarization infrastructure.

Summary format

The summary is wrapped in <hard_compaction_summary> tags so the model can distinguish it from verbatim history:

<hard_compaction_summary>
[LLM-generated summary of the middle history]
</hard_compaction_summary>

Why this matters

Hard compaction is the highest-leverage tool for freeing context budget when the window is under pressure. Unlike soft seams (which keep growing the message array), hard compaction replaces the middle history entirely. The system prompt segment (which is the largest stable block) is never touched, so the prefix cache remains hot after compaction.

This is especially valuable for long sessions where the context window is exhausted: instead of starting a new cycle (which resets the prefix cache), hard compaction can free enough budget to continue the current cycle with the prefix cache intact.

Testing

3 new unit tests:

  • plan_hard_compaction_returns_none_when_too_few_messages: Verify graceful handling of insufficient messages.
  • plan_hard_compaction_preserves_recent_tail: Verify the recent tail is preserved correctly.
  • hard_compaction_config_defaults: Verify the default configuration values.

All 59 compaction tests pass.

Files changed

  • crates/tui/src/compaction.rs: Add hard compaction types, functions, and tests (223 lines added)

Copy link
Copy Markdown
Contributor

@greptile-apps greptile-apps Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HUQIANTAO has reached the 50-review limit for trial accounts. To continue receiving code reviews, upgrade your plan.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a hard compaction mode to crates/tui/src/compaction.rs that replaces the middle conversation history with a single summary message while preserving the system prompt and recent messages. The review feedback identifies three issues: first, plan_hard_compaction can split tool calls and their corresponding results, which violates API contracts and causes errors; second, generating a summary_prompt in compact_hard_safe defeats prefix cache stability and causes redundancy; and third, the workspace parameter is unused and should be prefixed with an underscore to avoid compiler warnings.

Comment on lines +745 to +762
pub fn plan_hard_compaction(
messages: &[Message],
keep_recent: usize,
) -> Option<HardCompactionPlan> {
let len = messages.len();
if len < keep_recent + MIN_SUMMARIZE_MESSAGES {
return None;
}

let summarize_end = len.saturating_sub(keep_recent);
let summarize_indices: Vec<usize> = (0..summarize_end).collect();
let recent_indices: Vec<usize> = (summarize_end..len).collect();

Some(HardCompactionPlan {
summarize_indices,
recent_indices,
})
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The current implementation of plan_hard_compaction splits the message history at an arbitrary index (len - keep_recent). This can easily split a tool call and its corresponding tool result (e.g., the tool call is summarized, but the tool result is kept in the recent tail, or vice versa).

DeepSeek and OpenAI APIs strictly require that any tool role message must be preceded by an assistant message containing the matching tool_calls ID. If a tool result is orphaned in the active messages array, the API will reject the request with a 400 error.

To prevent this, we should adjust the summarize_end boundary backwards if any message at or after summarize_end is a ToolResult whose corresponding ToolUse is before summarize_end.

Comment on lines +1176 to +1188
let summary_prompt = if config.cache_summary {
Some(SystemPrompt::Text(format!(
"Previous conversation summary:\n{summary_text}"
)))
} else {
None
};

return Ok(HardCompactionResult {
messages: new_messages,
summary_prompt,
removed_messages: plan.summarize_indices,
});
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

In compact_hard_safe, if config.cache_summary is enabled, a summary_prompt is generated and returned in HardCompactionResult.

If the caller merges this summary_prompt into the system prompt (using merge_system_prompts), the system prompt will be modified. This completely defeats the primary design goal of hard compaction, which is to keep the system prompt byte-identical so that the prefix cache remains hot.

Furthermore, since the summary is already injected directly into the messages array as summary_msg (role: user), adding it to the system prompt is redundant and will cause the model to see the summary twice.

We should set summary_prompt to None for hard compaction to guarantee prefix cache stability.

                return Ok(HardCompactionResult {
                    messages: new_messages,
                    summary_prompt: None,
                    removed_messages: plan.summarize_indices,
                });

Comment on lines +1131 to +1136
pub async fn compact_hard_safe(
client: &DeepSeekClient,
messages: &[Message],
config: &CompactionConfig,
workspace: Option<&Path>,
) -> Result<HardCompactionResult> {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The workspace parameter is passed to compact_hard_safe but is never used within the function body. This will trigger an unused_variables compiler warning in strict builds.

If this parameter is kept for signature compatibility with compact_messages_safe, please prefix it with an underscore (_workspace).

Suggested change
pub async fn compact_hard_safe(
client: &DeepSeekClient,
messages: &[Message],
config: &CompactionConfig,
workspace: Option<&Path>,
) -> Result<HardCompactionResult> {
pub async fn compact_hard_safe(
client: &DeepSeekClient,
messages: &[Message],
config: &CompactionConfig,
_workspace: Option<&Path>,
) -> Result<HardCompactionResult> {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kj

@HUQIANTAO HUQIANTAO force-pushed the feat/hard-compact-system-preserve branch from 75a02bf to be5745d Compare June 1, 2026 16:40
Copy link
Copy Markdown
Contributor

@greptile-apps greptile-apps Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HUQIANTAO has reached the 50-review limit for trial accounts. To continue receiving code reviews, upgrade your plan.

Add hard compaction mode that replaces the middle conversation history
with a single summary message while preserving the system prompt segment
and the last N recent messages (default 8). This is opt-in via the
hard_enabled configuration flag.

Hard compaction differs from the existing soft-seam approach:
- Soft seams append <archived_context> blocks without removing messages
- Hard compaction replaces the middle messages with a summary

Both approaches preserve prefix cache stability because the system
prompt segment (stored separately in session.system_prompt) is never
touched. The summary message replaces the middle history, and the
recent tail is preserved verbatim.

Why hard compaction matters:
When the context window is under pressure, the existing soft-seam
approach keeps growing the message array (appending summary blocks
without removing old messages). Hard compaction is more aggressive:
it replaces the middle history entirely, freeing more budget. The
trade-off is that old messages are lost (they're summarized), but
the system prompt segment remains byte-identical, so the prefix
cache stays hot after compaction.

Changes:
- Add HardCompactionPlan and HardCompactionResult types
- Add plan_hard_compaction() function that plans which messages to
  summarize and which to keep
- Add compact_hard_safe() async function that executes the hard
  compaction using the existing LLM summarization infrastructure
- Add hard_enabled and hard_keep_recent fields to CompactionConfig
  (defaults: false and 8)
- Add 3 unit tests for the new functionality
@HUQIANTAO HUQIANTAO force-pushed the feat/hard-compact-system-preserve branch from be5745d to 7321528 Compare June 3, 2026 12:23
Copy link
Copy Markdown
Contributor

@greptile-apps greptile-apps Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HUQIANTAO has reached the 50-review limit for trial accounts. To continue receiving code reviews, upgrade your plan.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants