Skip to content

Conversation

@msrathore-db
Copy link
Collaborator

@msrathore-db msrathore-db commented Nov 17, 2025

🥞 Stacked PR

Use this link to review incremental changes.


Summary

Implements core straggler detection algorithm that analyzes download metrics and identifies abnormally slow downloads using median throughput-based analysis.

Detection Algorithm:

  1. Wait until minimum completion quantile reached (default: 60% of downloads complete)
  2. Calculate median throughput from completed downloads
  3. Identify active downloads running slower than median × multiplier (default: 1.5×)
  4. Apply padding grace period before flagging (default: 5 seconds)
  5. Track detected stragglers to prevent duplicates

Key Changes:

  • StragglerDownloadDetector class implementation
    • Configurable multiplier, quantile, padding, and fallback threshold
    • Duplicate detection prevention via tracking dictionary
    • Filters out already-cancelled downloads
  • ✅ Comprehensive unit tests
    • Parameter validation (multiplier, quantile ranges)
    • Median calculation correctness (odd/even counts)
    • Quantile threshold enforcement
    • Fallback threshold triggering
    • Edge cases: empty lists, cancelled downloads

…loads

- Implement core detection algorithm using median throughput analysis
- Configurable multiplier (default 1.5x slower than median)
- Minimum completion quantile (default 60%)
- Straggler padding grace period (default 5 seconds)
- Sequential fallback threshold tracking
- Duplicate detection prevention via tracking dictionary
- Add comprehensive unit tests for parameter validation, median calculation, and edge cases

Builds on: stack/straggler-metrics
/// <param name="stragglerDetectionPadding">Extra buffer time before declaring a download as a straggler.</param>
/// <param name="maxStragglersBeforeFallback">Maximum stragglers before triggering sequential fallback.</param>
public StragglerDownloadDetector(
double stragglerThroughputMultiplier,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we use a struggler download config object?

/// <param name="alreadyCounted">Dictionary to track already counted stragglers (prevents duplicate counting).</param>
/// <returns>Collection of file offsets identified as stragglers.</returns>
public IEnumerable<long> IdentifyStragglerDownloads(
IReadOnlyList<FileDownloadMetrics> allDownloadMetrics,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we make these 2 list part of the detector?

@@ -0,0 +1,219 @@
/*
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add tracing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants