-
Notifications
You must be signed in to change notification settings - Fork 37
_sandboxremote.py: avoid reusing failed actions #2033
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
| stub = self.exec_remote.exec_service | ||
| request = remote_execution_pb2.ExecuteRequest( | ||
| instance_name=self.exec_remote.instance_name, action_digest=action_digest, skip_cache_lookup=False | ||
| instance_name=self.exec_remote.instance_name, action_digest=action_digest, skip_cache_lookup=True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't want to always skip cache lookup. If no action-service-cache is declared, internal action cache lookup by the remote execution server should still be used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, even when the action-cache-service is defined, there is still value in having the execution re-do cache lookup (in case the action is requested and built by someone else before our action reaches the top of the queue).
You're right that this is probably working around a "broken" remote execution server. I've only tested it with buildbox-casd, and wasn't aware it was different with other servers.
|
I think the main issue is that failed actions are cached in the action cache in the first place. While there may be circumstances where caching failed actions is useful, I don't think we need this for BuildStream at all (given that we have a higher level caching mechanism where we already support caching failures) and most remote execution servers default to not caching failed actions, mainly because failures may be spurious, e.g., due to a worker running out of RAM. BuildGrid caches failures by default but it can be disabled with On the BuildStream side, might it be sufficient to skip action cache lookup (direct action cache query as well as indirectly via Execute()) if |
Yeah, I was using this with buildbox-casd as a server. The failures in question were due to a bug in buildbox-fuse.
The thing is |
Good point but maybe we can find a way to forward this information also in the interactive retry case. |
This is a proposal towards #2020. It's probably not the most efficient way to do it, but at least it works.
What this does is: