-
Notifications
You must be signed in to change notification settings - Fork 604
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Bug Report
Description
When performing a stress test on the Mooncake Store L3 HiCache, calling the /flush_cache interface occasionally triggers a warning. This issue appears to happen when there is high concurrency or when the system is busy processing previous cache-filling requests.
Logs:
I20260314 09:46:03.380788 2959 transfer_engine_impl.cpp:668] [Metrics] Transfer Engine Stats (over last 5s): Throughput: 1098.86 MB/s
W20260314 09:46:15.382206 9193 transport.h:243] detected slice leak: allocated 0 freed 671646
W20260314 09:46:15.382339 9193 transport.h:243] detected slice leak: allocated 671646 freed 0
[2026-03-14 09:46:15 TP4] Cache flushed successfully!
[2026-03-14 09:46:15 TP7] Cache flushed successfully!
[2026-03-14 09:46:15 TP5] Cache flushed successfully!
[2026-03-14 09:46:15 TP6] Cache flushed successfully!
[2026-03-14 09:46:15 TP3] Cache flushed successfully!
[2026-03-14 09:46:15 TP2] Cache flushed successfully!
[2026-03-14 09:46:15 TP1] Cache flushed successfully!
[2026-03-14 09:46:15 TP0] Cache flushed successfully!
Corresponding Code:
Environment
- Model: Kimi-K2.5
- Dataset: ShareGPT_V3
- Component: Mooncake Store L3 HiCache
Reproduction Steps
The issue can be reproduced by running the following stress test script. The script fills the cache with a fixed seed, triggers a flush, and then re-sends the same requests to verify cache behavior.
The issue is not 100% reproducible in every single run but occurs sporadically when the script is executed multiple times under high load.
#!/bin/bash
# Mooncake Store L3 HiCache Stress Test Script
# Test Procedure:
# 1. Send a high volume of requests to fill the cache.
# 2. Flush the cache.
# 3. Resend the same requests (using the same seed to ensure identical payloads).
# 4. Observe master.log to verify if Mooncake Store performs the GET operations correctly.
set -e
# Configuration Parameters
HOST="10.20.32.66"
PORT="8000"
FLUSH_URL="http://${HOST}:${PORT}/flush_cache"
MODEL_PATH="/mnt/data/models/Kimi-K2.5"
DATASET_PATH="/mnt/data/zyn/datasets/ShareGPT_V3_unfiltered_cleaned_split.json"
BENCHMARK_SCRIPT="/sgl-workspace/sglang/benchmark/hicache/bench_multiturn.py"
LOG_DIR="./logs"
TIMESTAMP=$(date +"%Y%m%d_%H%M%S")
# Fix the random seed to ensure identical request sequences across runs
SEED=233
# Create log directory
mkdir -p "$LOG_DIR"
echo "========================================="
echo "Mooncake Store L3 HiCache Stress Test"
echo "Time: $(date)"
echo "Target: ${HOST}:${PORT}"
echo "========================================="
# Bypass proxies
unset http_proxy
unset https_proxy
# Backup original master.log (Optional)
# if [ -f "master.log" ]; then
# cp master.log "${LOG_DIR}/master_before_${TIMESTAMP}.log"
# echo "✓ Original master.log backed up to ${LOG_DIR}/master_before_${TIMESTAMP}.log"
# fi
#
# Clear master.log to isolate this test session
# > master.log
# echo "✓ master.log has been cleared"
# Test Parameters
NUM_CLIENTS=8
NUM_ROUNDS=5
MAX_PARALLEL=8
REQUEST_LENGTH=16384
REQUEST_RATE=20
echo ""
echo "========================================="
echo "Phase 1: Send initial high-volume requests (Cache Filling)"
echo "========================================="
echo "Parameters: --seed ${SEED} --num-clients ${NUM_CLIENTS} --num-rounds ${NUM_ROUNDS} --max-parallel ${MAX_PARALLEL} --request-rate ${REQUEST_RATE}"
echo "Description: Using fixed seed ${SEED} for reproducible request sequences"
python3 "$BENCHMARK_SCRIPT" \
--model-path "$MODEL_PATH" \
--dataset-path "$DATASET_PATH" \
--host "$HOST" \
--port "$PORT" \
--output-length 16 \
--request-length "$REQUEST_LENGTH" \
--num-clients "$NUM_CLIENTS" \
--num-rounds "$NUM_ROUNDS" \
--max-parallel "$MAX_PARALLEL" \
--request-rate "$REQUEST_RATE" \
--ready-queue-policy random \
--seed "$SEED" \
--disable-auto-run \
2>&1 | tee "${LOG_DIR}/phase1_initial_${TIMESTAMP}.log"
echo ""
echo "✓ Phase 1 Complete"
echo " Initial requests sent; cache should now be populated."
# Wait briefly to ensure all requests are fully processed
sleep 5
echo ""
echo "========================================="
echo "Phase 2: Flush Cache"
echo "========================================="
echo "Calling: $FLUSH_URL"
curl --noproxy "*" -X POST "$FLUSH_URL"
echo ""
echo "✓ Phase 2 Complete"
echo " Cache has been flushed."
# Wait for the flush operation to settle
sleep 2
echo ""
echo "========================================="
echo "Phase 3: Resend identical requests (Testing Cache Invalidation)"
echo "========================================="
echo "Parameters: --seed ${SEED} --num-clients ${NUM_CLIENTS} --num-rounds ${NUM_ROUNDS} --max-parallel ${MAX_PARALLEL} --request-rate ${REQUEST_RATE}"
echo "Description: Reusing seed ${SEED}; request sequence is identical to Phase 1"
python3 "$BENCHMARK_SCRIPT" \
--model-path "$MODEL_PATH" \
--dataset-path "$DATASET_PATH" \
--host "$HOST" \
--port "$PORT" \
--output-length 16 \
--request-length "$REQUEST_LENGTH" \
--num-clients "$NUM_CLIENTS" \
--num-rounds "$NUM_ROUNDS" \
--max-parallel "$MAX_PARALLEL" \
--request-rate "$REQUEST_RATE" \
--ready-queue-policy random \
--seed "$SEED" \
--disable-auto-run \
2>&1 | tee "${LOG_DIR}/phase2_after_flush_${TIMESTAMP}.log"
echo ""
echo "✓ Phase 3 Complete"
echo " Post-flush requests sent."
# Wait for processing to complete
sleep 5
echo ""
echo "========================================="
echo "Phase 4: Results Analysis"
echo "========================================="
# Save master.log after the test
cp master.log "${LOG_DIR}/master_after_${TIMESTAMP}.log"
echo "✓ Saved master.log to ${LOG_DIR}/master_after_${TIMESTAMP}.log"
echo ""
echo "========================================="
echo "Test Finished!"
echo "Logs saved in: ${LOG_DIR}/"
echo " - phase1_initial_${TIMESTAMP}.log (Initial run)"
echo " - phase2_after_flush_${TIMESTAMP}.log (Run after flush)"
echo " - master_before_${TIMESTAMP}.log (Pre-test log, if enabled)"
echo " - master_after_${TIMESTAMP}.log (Post-test log)"
echo ""
echo "Please inspect ${LOG_DIR}/master_after_${TIMESTAMP}.log to verify correct Mooncake Store behavior."
echo "========================================="Before submitting...
- Ensure you searched for relevant issues and read the [documentation]
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working