-
-
Notifications
You must be signed in to change notification settings - Fork 8.5k
[grid] Improve readTimeout and cache expiration of HttpClient between Router and Nodes #16154
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: trunk
Are you sure you want to change the base?
Conversation
PR Reviewer Guide 🔍Here are some key observations to aid the review process:
|
PR Code Suggestions ✨Explore these optional code suggestions:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
java/src/org/openqa/selenium/grid/router/GridStatusHandler.java
Outdated
Show resolved
Hide resolved
@joerg1985 Ok I see. Since I just tried this after moving on to Caffeine cache builder recently. |
… Router and Nodes Signed-off-by: Viet Nguyen Duc <[email protected]>
@joerg1985, can you review again? I spent hours reading Caffeine support, now the cache expiration is aligned with read timeout and dynamically based on the destination node session timeout (I also handle resilient fallback to get default ClientConfig if without /status endpoint). |
@VietND96 i don't think it is enought to use a different timeout, as the connection timeout and delays in processing (e.g. gc pauses) must be respected too, to ensure the client is not closed before the response is processed. Additionally the timeout handling inside the JDK is not perfect (there is an open issue regarding this in the JDK Jira), so the client might get a worse response as the client is closed than from the real timeout. The current timeout handling will calculate the inactivity from the last time the client did return a result and not from the last time the client was used. So a client might get closed to fast and the next request to the session has to start a new client first. |
Hmm, so for example, read timeout is 300s, at the second 100th, HttpClient is loaded from cache to execute a request, expiration immediately reset back to 300s, also not a safe enough protocol? |
I think usage counting is the correct pattern to ensure the client is not in use. Perhaps @ben-manes can tell if there is a matching pattern inside Coffein or it might be interresting to have such a mechanism integrated to Caffeine. The basic problem is a long running operation on the cached object and it should not be disposed while this operation is running. This does not sound to exotic to handle it inside a library to me. |
* @param lastAccessTime the last time this entry was accessed | ||
* @return true if the entry should be evicted | ||
*/ | ||
boolean isExpired(long lastAccessTime) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could not find any call to this method, is guess this can be removed or called 😀
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I am reworking this based on your feedback. Actually, when using Expiry (by create, read, update Cache), we could not overwrite the expired logic, so this isExpired is redundant and useless.
You're correct, client is not in use is still a safe protocol. I am thinking how can implement cache with this pattern.
// and a removal listener to close HTTP clients | ||
this.httpClientCache = | ||
Caffeine.newBuilder() | ||
.expireAfter( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you don't need to log based on the operation type, you could use
Expiry.accessing((key, value) -> {
LOG.fine("Set (read timeout: {} seconds) for {} in cache",
cacheEntry.getConfig().readTimeout().toSeconds(), uri);
return cacheEntry.getConfig().readTimeout()
});
You can use pinning if you inform the cache of the long-running operation to exclude the entry from consideration. That requires calling into the cache to modify the entry's metadata. That can be difficult to retrofit and there isn't an entry callback for in-use since that would degrade eviction to O(n) and unbounded usage. You might consider using a two-level cache with the primary as weak reference for reference counting that loads and evicts from an expiring cache. |
Signed-off-by: Viet Nguyen Duc <[email protected]>
* <p>Weight Strategy: - inUse == 0: Weight = 1 (normal, can be evicted) - inUse > 0: Weight = | ||
* Integer.MAX_VALUE (effectively pinned, won't be evicted) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pinning has a weight of zero. It represents how much space the entry takes, so MAX_VALUE is well everything.
@joerg1985 I added back |
User description
🔗 Related Issues
💥 What does this PR do?
caffeine.cache.Expiry
).expireAfterCreate
(the first time HttpClient is added to cache),expireAfterRead
(when HttpClient is accessed from cache).Here is few logs when enable log-level
FINE
🔧 Implementation Notes
💡 Additional Considerations
🔄 Types of changes
PR Type
Enhancement
Description
Replace manual HTTP client cache with Caffeine cache library
Simplify resource management with automatic cleanup
Add proper executor service shutdown in GridStatusHandler
Remove complex manual cache eviction logic
Diagram Walkthrough
File Walkthrough
HandleSession.java
Replace manual cache with Caffeine implementation
java/src/org/openqa/selenium/grid/router/HandleSession.java
GridStatusHandler.java
Add proper resource cleanup
java/src/org/openqa/selenium/grid/router/GridStatusHandler.java
BUILD.bazel
Add Caffeine dependency
java/src/org/openqa/selenium/grid/router/BUILD.bazel