-
-
Notifications
You must be signed in to change notification settings - Fork 427
Android CI: Fix concurrent android sdk download #6551
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Android CI: Fix concurrent android sdk download #6551
Conversation
|
if this works out, we probably should create some sort of AndroidSdkManagerWorker that handles the installations for all modules (i.e. AndroidSdkModule) . I suspect the issue would exist in other parts but it's probably more stable due to the license check and the ndk is only used in one place in the CI |
|
hm, need to dig deeper 9307] Warning: Observed package id 'tools' in inconsistent location '/home/runner/.android/sdk/tools-2' (Expected '/home/runner/.android/sdk/tools') |
|
hm, something else is (also?) wrong, for some reason, the downloads get interrupted |
|
thanks @souvlakias |
|
I'm not convinced we want a general purpose generic action runner in the context of the worker. Can we instead move the download jobs into the worker and use them via API? The worker then can do whatever is needed to guarantee it's working well. |
yep, will do it like that, the general purpose was just the quickest way to get there, but I can make the general purpose function private and expose domain methods + get the os calls in, effectively move some of the AndroidSdkModule methods to the worker |
|
I thought, the |
yeah, something is weird there, the synchronized didn't seem to work, even if I replaced the file lock with one of them. I think the AndroidSdkModule is not a singleton, it's extended in every example. But even so, I'd expect the private objects could be used as locks, but apparently not. (Even so the presence of 3 locks could cause a deadlock, which might be one of the reasons the job occasionally times out) |
|
So, we need to use a single ExternalModule to hold all the download logic (or if you want to use locks, to own the locks). If not we can't ensure the locks are singletons. Even some clever worker logic won't help, if the worker has multiple instances due to being bound to non-singleton |
Ok, I'll give it a go with the external module in the weekend, I was gonna do major work on the file lock approach anyway in the worker, might as well do the external module approach |
| Task.log.info( | ||
| s"Waiting for sdkmanager lock ($attempts/$maxAttempts)... Trying again in 5 seconds!" | ||
| ) | ||
| Thread.sleep(5000) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking this could be shorter or an exponential backoff, but , we should arrive here only in cases where multiple mill jvm processes exist, in cases of same process access, it should be handled by the pool (can't enter here twice from the same process anyway). It's probably a mill CI only flow
|
so far , android tests run consistently green 2 times (2 in my branch, 2 here). The only failure seems to be the same on main branch |
|
@vaslabs If you're done with this PR I'll merge it |
|
I'm done, I'll keep an eye for improvements as we go along anyway! |
Fixes #6522
Implementation details
Context: A lot of this was derived through trial and error, please let me know if I missed something Mill internal related.
Aim of the implementation
To support the shared (
example.androidlib.__.shared.daemon.testForked) mode for Android we need to guarantee that one SdkManager process runs at any time. SdkManager is a wrapper of a java application called SdkManagerCli so we can observe this with tools like visualvmFunnel/Locking strategy
First, we want to guarantee that within the same process, only one thread executes the sdkmanager installation process (different tasks that do different things with sdkmanager can access it in parallel within the same jvm process), which also relies on checking the filesystem outside of Mill's control (such as the android home directory). This was true for sometime, but it recently became unreliable especially with the shared mode.
Thus, bringing to the second part, which is to support the only one instance of sdkmanager running guarantee across different jvm processes.
Worker
The download responsibility has been moved to an external module -
AndroidSdkManagerModule. This encapsulates a worker, with a single threaded pool, which guarantees - in process - only one execution at a time.To guarantee across jvm processes, a file lock is used in the android home, handled by the worker.
Shared daemon seems to be behaving non deterministically still
Sometimes I see this failure
Weirdly, the file is there. I don't know under what conditions this happens yet.