-
Notifications
You must be signed in to change notification settings - Fork 9.1k
HADOOP-19624 Thread leak in ABFS AbfsClientThrottlingAnalyzer #7852
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: trunk
Are you sure you want to change the base?
Conversation
This method closes the throttling analyzer and releases any associated resources.
💔 -1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the patch and thorough testing of the issue @mattkduran
I have a few suggestions and comments. Please do take a look at them.
Also we need at least one test (unit or integration) to be inlcuded in this patch. Can you plan for one? Idea is to have coverage of code impacted here.
Also, I see a few PR checks failing, If you click on the link of each -1 commented by hadoop-yetus, you should be able to see the issue reported and fix them.
Once all of this is done, we can wait for a few more reviews and get this checked in.
Thanks again for all the efforts.
@@ -26,7 +26,7 @@ | |||
*/ | |||
@InterfaceAudience.Private | |||
@InterfaceStability.Unstable | |||
public interface AbfsThrottlingIntercept { | |||
public interface AbfsThrottlingIntercept extends Closable { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be extends Closeable
?
@@ -40,4 +42,12 @@ public void updateMetrics(final AbfsRestOperationType operationType, | |||
public void sendingRequest(final AbfsRestOperationType operationType, | |||
final AbfsCounters abfsCounters) { | |||
} | |||
|
|||
/** | |||
* No-op implementation of close method. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: javadoc to include @ throws
|
||
/** | ||
* Closes the throttling intercept and releases associated resources. | ||
* This method closes both the read and write throttling analyzers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Javadoc to include @ throws
/** | ||
* Closes the throttling analyzer and releases associated resources. | ||
* This method cancels the internal timer and cleans up any pending timer tasks. | ||
* It is safe to call this method multiple times. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix javadoc here a well.
Description of PR
The ABFS driver's auto-throttling feature (
fs.azure.enable.autothrottling=true
) creates Timer threads in AbfsClientThrottlingAnalyzer that are never properly cleaned up, leading to a memory leak that eventually causes OutOfMemoryError in long-running applications like Hive Metastore.Impact:
Root Cause:
AbfsClientThrottlingAnalyzer creates Timer objects in its constructor but provides no mechanism to cancel them. When AbfsClient instances are closed, the associated timer threads continue running indefinitely.
Solution
Implement proper resource cleanup by making the throttling components implement Closeable and ensuring timers are cancelled when ABFS clients are closed.
Changes Made
https://github.com/mattkduran/ABFSleaktest
https://www.mail-archive.com/[email protected]/msg43483.html
How was this patch tested?
Standalone Validation Tool
This fix was validated using a standalone reproduction and testing tool that directly exercises the ABFS auto-throttling components outside of a full Hadoop deployment.
Repository: ABFSLeakTest
Testing Scope
Test Results
Leak Reproduction Evidence
Test Environment
For code changes:
LICENSE
,LICENSE-binary
,NOTICE-binary
files?