-
Notifications
You must be signed in to change notification settings - Fork 4
feat(csharp): implemented straggler download mitigation for cloudFetch #27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
eric-wang-1990
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have 2 overall comments:
- This is too large a PR for a reasonable review, especially when it contains many concurrent control scenario, which needs careful review and design, is it possible to break into smaller pieces?
- Have you been able to try out in PowerBI E2E?
| { | ||
| try | ||
| { | ||
| await Task.Delay(StragglerMonitoringInterval, cancellationToken).ConfigureAwait(false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this monitoring every 2 second? What does this mean for performance/speed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't try benchmarking on the value for monitoring the performance. Once we confirm the design of the feature, Ill run benchmarking for this param
| _downloadSemaphore.Release(); | ||
| bool shouldAcquireSequential = _isSequentialMode; | ||
| bool acquiredSequential = false; | ||
| if (shouldAcquireSequential) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why you need this extra shouldAcquireSequential to get the snapshot of _isSequentialMode? What if the value _isSequentialMode changes between 404 and 406?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reasoning was to not release the semaphore if it was never acquired. But this is a potential race condition. Updated the code. Thanks
…apshot of isSequentialMode for acquiring the semaphore
I can raise stacked PRs for this feature. That should make it easier to review |
Summary
Implements straggler download mitigation for CloudFetch to improve query performance by detecting and cancelling abnormally slow parallel downloads. The feature monitors active downloads and automatically retries stragglers when faster slots become available, with an optional fallback to sequential mode after a configurable threshold.
Changes
New Classes:
StragglerDownloadDetector(StragglerDetector.cs)FileDownloadMetrics(FileDownloadMetrics.cs)CloudFetchStragglerMitigationConfig(CloudFetchStragglerMitigationConfig.cs)CloudFetchDownloader Integration:
CancellationTokenSourcefor clean cancellationConfiguration Parameters:
All parameters use
adbc.databricks.cloudfetch.prefix:straggler_mitigation_enabled(default: false) - Feature togglestraggler_multiplier(default: 1.5) - Throughput multiplier for detectionstraggler_quantile(default: 0.6) - Minimum completion percentage before detectionstraggler_padding_seconds(default: 5) - Grace period before flagging as stragglermax_stragglers_per_query(default: 10) - Threshold to trigger sequential fallbacksynchronous_fallback_enabled(default: true) - Enable automatic fallback to sequential modeBenefits
Technical Details
Detection Algorithm:
CancellationTokenSourceSequential Fallback:
FetchResultscallThread Safety:
ConcurrentDictionaryfor metrics and cancellation tokensTesting
38 tests total, all passing:
Unit Tests (19):
FileDownloadMetricsthroughput calculation (before/after completion)FileDownloadMetricsstraggler flag settingStragglerDownloadDetectorparameter validation (multiplier, quantile)E2E Tests (19):
Documentation
straggler-mitigation-design.md- Comprehensive design doc with algorithm details, implementation notes, configuration guide, and usage examples🤖 Generated with Claude Code