Skip to content

Conversation

@anivar
Copy link

@anivar anivar commented Jul 20, 2025

Problem

Issue #771: Production servers crash with std::length_error: vector during KV cache management.

Root Cause

Integer underflow in update_slots() at line 1714:

slot.cache_tokens.resize(slot.cache_tokens.size() - n_discard);

When n_discard >= cache_tokens.size(), the subtraction underflows, requesting massive memory allocation.

Fix

Added bounds checking before resize:

if (n_discard >= 0 && (size_t)n_discard < slot.cache_tokens.size()) {
    slot.cache_tokens.resize(slot.cache_tokens.size() - n_discard);
} else {
    slot.cache_tokens.clear();
}

Testing

  • Builds successfully
  • Unit test verifies fix handles edge cases
  • No crashes with various n_discard values

Fixes production crashes during high memory pressure scenarios.

Resolves issue mozilla-ai#771 where server crashes with std::length_error when
KV cache context shifting attempts to resize cache_tokens vector with
integer underflow.

The bug occurs in update_slots() when n_discard >= cache_tokens.size(),
causing cache_tokens.resize(size - n_discard) to underflow and request
massive memory allocation, triggering std::length_error exception.

Changes:
- Add bounds checking before cache_tokens.resize() in server.cpp:1714
- Clear cache_tokens when n_discard would cause underflow
- Prevent negative n_discard values from causing issues

This fix prevents production server crashes reported with Chinese text
translation workloads and high memory pressure scenarios.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants