-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[opt](cloud) Prioritize scheduling the most recently active tablets in cloud #59539
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
91eef60 to
2f8bdee
Compare
ac383bb to
a71331b
Compare
|
run buildall |
|
run buildall |
TPC-H: Total hot run time: 31449 ms |
TPC-DS: Total hot run time: 176500 ms |
|
|
||
| // Bucket size in milliseconds (default: 1 minute) | ||
| // Smaller bucket = more accurate but more memory | ||
| private final long bucketSizeMs; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what does size mean?
| .thenComparing(Comparator.comparingLong((AccessStatsResult r) -> r.lastAccessTime).reversed()); | ||
|
|
||
| // Time window in milliseconds (default: 1 hour) | ||
| private final long timeWindowMs; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is this time window for?
| private ThreadPoolExecutor asyncExecutor; | ||
|
|
||
| // Default time window: 1 hour | ||
| private static final long DEFAULT_TIME_WINDOW_SECOND = 3600L; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be configruable
| * Get shard index for a given ID | ||
| */ | ||
| private int getShardIndex(long id) { | ||
| return (int) (id & (SHARD_SIZE - 1)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
better use some hash function, e.g. crc/md5, on id first and then take modulo
| */ | ||
| public long getTabletIdByReplicaId(long replicaId) { | ||
| // Use timeout to prevent deadlock, default timeout: 2 seconds | ||
| long timeoutMs = 2000L; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the magic 2000? why not 500 or 1000?
| * Result for top N query | ||
| */ | ||
| public static class AccessStatsResult { | ||
| public final long id; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tabletId?
| long currentTime = System.currentTimeMillis(); | ||
| // Record tablet access | ||
| // Get real tabletId from TabletInvertedIndex instead of calculating it | ||
| long tabletId = Env.getCurrentInvertedIndex().getTabletIdByReplicaId(replicaId); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there any performance issue with inverted idex accessing?
| } catch (Exception e) { | ||
| // If executor is shutdown or queue is full, silently ignore | ||
| // Statistics can tolerate some loss | ||
| if (LOG.isDebugEnabled()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
log info here, exception here is abnormal
What problem does this PR solve?
Starting from the getBackendId of the replica, implement a sliding window approach (in TabletAccessStats.java) to count read and write operations over the last hour. The accuracy of this count isn't critical, as the aim is to indicate relative activity levels.
Display these statistics in the SHOW TABLETS FROM TABLE and SHOW TABLET {tabletId} commands.
Issue Number: close #xxx
Related PR: #xxx
Problem Summary:
Release note
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)