Skip to content

[Bug]: Intermittent Warning during flush_cache under Sglang L3 HiCache Stress Test #1654

@00fish0

Description

@00fish0

Bug Report

Description

When performing a stress test on the Mooncake Store L3 HiCache, calling the /flush_cache interface occasionally triggers a warning. This issue appears to happen when there is high concurrency or when the system is busy processing previous cache-filling requests.

Logs:

I20260314 09:46:03.380788  2959 transfer_engine_impl.cpp:668] [Metrics] Transfer Engine Stats (over last 5s): Throughput: 1098.86 MB/s
W20260314 09:46:15.382206  9193 transport.h:243] detected slice leak: allocated 0 freed 671646
W20260314 09:46:15.382339  9193 transport.h:243] detected slice leak: allocated 671646 freed 0
[2026-03-14 09:46:15 TP4] Cache flushed successfully!
[2026-03-14 09:46:15 TP7] Cache flushed successfully!
[2026-03-14 09:46:15 TP5] Cache flushed successfully!
[2026-03-14 09:46:15 TP6] Cache flushed successfully!
[2026-03-14 09:46:15 TP3] Cache flushed successfully!
[2026-03-14 09:46:15 TP2] Cache flushed successfully!
[2026-03-14 09:46:15 TP1] Cache flushed successfully!
[2026-03-14 09:46:15 TP0] Cache flushed successfully!
Image

Corresponding Code:

Image

Environment

  • Model: Kimi-K2.5
  • Dataset: ShareGPT_V3
  • Component: Mooncake Store L3 HiCache

Reproduction Steps

The issue can be reproduced by running the following stress test script. The script fills the cache with a fixed seed, triggers a flush, and then re-sends the same requests to verify cache behavior.

The issue is not 100% reproducible in every single run but occurs sporadically when the script is executed multiple times under high load.

#!/bin/bash

# Mooncake Store L3 HiCache Stress Test Script
# Test Procedure:
# 1. Send a high volume of requests to fill the cache.
# 2. Flush the cache.
# 3. Resend the same requests (using the same seed to ensure identical payloads).
# 4. Observe master.log to verify if Mooncake Store performs the GET operations correctly.

set -e

# Configuration Parameters
HOST="10.20.32.66"
PORT="8000"
FLUSH_URL="http://${HOST}:${PORT}/flush_cache"
MODEL_PATH="/mnt/data/models/Kimi-K2.5"
DATASET_PATH="/mnt/data/zyn/datasets/ShareGPT_V3_unfiltered_cleaned_split.json"
BENCHMARK_SCRIPT="/sgl-workspace/sglang/benchmark/hicache/bench_multiturn.py"
LOG_DIR="./logs"
TIMESTAMP=$(date +"%Y%m%d_%H%M%S")

# Fix the random seed to ensure identical request sequences across runs
SEED=233

# Create log directory
mkdir -p "$LOG_DIR"

echo "========================================="
echo "Mooncake Store L3 HiCache Stress Test"
echo "Time: $(date)"
echo "Target: ${HOST}:${PORT}"
echo "========================================="

# Bypass proxies
unset http_proxy
unset https_proxy

# Backup original master.log (Optional)
# if [ -f "master.log" ]; then
#     cp master.log "${LOG_DIR}/master_before_${TIMESTAMP}.log"
#     echo "✓ Original master.log backed up to ${LOG_DIR}/master_before_${TIMESTAMP}.log"
# fi
#
# Clear master.log to isolate this test session
# > master.log
# echo "✓ master.log has been cleared"

# Test Parameters
NUM_CLIENTS=8
NUM_ROUNDS=5
MAX_PARALLEL=8
REQUEST_LENGTH=16384
REQUEST_RATE=20

echo ""
echo "========================================="
echo "Phase 1: Send initial high-volume requests (Cache Filling)"
echo "========================================="
echo "Parameters: --seed ${SEED} --num-clients ${NUM_CLIENTS} --num-rounds ${NUM_ROUNDS} --max-parallel ${MAX_PARALLEL} --request-rate ${REQUEST_RATE}"
echo "Description: Using fixed seed ${SEED} for reproducible request sequences"

python3 "$BENCHMARK_SCRIPT" \
    --model-path "$MODEL_PATH" \
    --dataset-path "$DATASET_PATH" \
    --host "$HOST" \
    --port "$PORT" \
    --output-length 16 \
    --request-length "$REQUEST_LENGTH" \
    --num-clients "$NUM_CLIENTS" \
    --num-rounds "$NUM_ROUNDS" \
    --max-parallel "$MAX_PARALLEL" \
    --request-rate "$REQUEST_RATE" \
    --ready-queue-policy random \
    --seed "$SEED" \
    --disable-auto-run \
    2>&1 | tee "${LOG_DIR}/phase1_initial_${TIMESTAMP}.log"

echo ""
echo "✓ Phase 1 Complete"
echo "  Initial requests sent; cache should now be populated."

# Wait briefly to ensure all requests are fully processed
sleep 5

echo ""
echo "========================================="
echo "Phase 2: Flush Cache"
echo "========================================="
echo "Calling: $FLUSH_URL"

curl --noproxy "*" -X POST "$FLUSH_URL"
echo ""

echo "✓ Phase 2 Complete"
echo "  Cache has been flushed."

# Wait for the flush operation to settle
sleep 2

echo ""
echo "========================================="
echo "Phase 3: Resend identical requests (Testing Cache Invalidation)"
echo "========================================="
echo "Parameters: --seed ${SEED} --num-clients ${NUM_CLIENTS} --num-rounds ${NUM_ROUNDS} --max-parallel ${MAX_PARALLEL} --request-rate ${REQUEST_RATE}"
echo "Description: Reusing seed ${SEED}; request sequence is identical to Phase 1"

python3 "$BENCHMARK_SCRIPT" \
    --model-path "$MODEL_PATH" \
    --dataset-path "$DATASET_PATH" \
    --host "$HOST" \
    --port "$PORT" \
    --output-length 16 \
    --request-length "$REQUEST_LENGTH" \
    --num-clients "$NUM_CLIENTS" \
    --num-rounds "$NUM_ROUNDS" \
    --max-parallel "$MAX_PARALLEL" \
    --request-rate "$REQUEST_RATE" \
    --ready-queue-policy random \
    --seed "$SEED" \
    --disable-auto-run \
    2>&1 | tee "${LOG_DIR}/phase2_after_flush_${TIMESTAMP}.log"

echo ""
echo "✓ Phase 3 Complete"
echo "  Post-flush requests sent."

# Wait for processing to complete
sleep 5

echo ""
echo "========================================="
echo "Phase 4: Results Analysis"
echo "========================================="

# Save master.log after the test
cp master.log "${LOG_DIR}/master_after_${TIMESTAMP}.log"
echo "✓ Saved master.log to ${LOG_DIR}/master_after_${TIMESTAMP}.log"


echo ""
echo "========================================="
echo "Test Finished!"
echo "Logs saved in: ${LOG_DIR}/"
echo "  - phase1_initial_${TIMESTAMP}.log (Initial run)"
echo "  - phase2_after_flush_${TIMESTAMP}.log (Run after flush)"
echo "  - master_before_${TIMESTAMP}.log (Pre-test log, if enabled)"
echo "  - master_after_${TIMESTAMP}.log (Post-test log)"
echo ""
echo "Please inspect ${LOG_DIR}/master_after_${TIMESTAMP}.log to verify correct Mooncake Store behavior."
echo "========================================="

Before submitting...

  • Ensure you searched for relevant issues and read the [documentation]

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions