Skip to content

Conversation

@vaslabs
Copy link
Contributor

@vaslabs vaslabs commented Jan 8, 2026

  • Fixes parallel app testing conflicts on sdkmanager image download and conflicting on emulators.
  • I've tried two approaches, one with syncrhonized and one with file lock. The synchronized approach was not consistently working but file lock seems to work with share.

Fixes #6522

Implementation details

Context: A lot of this was derived through trial and error, please let me know if I missed something Mill internal related.

Aim of the implementation

To support the shared (example.androidlib.__.shared.daemon.testForked) mode for Android we need to guarantee that one SdkManager process runs at any time. SdkManager is a wrapper of a java application called SdkManagerCli so we can observe this with tools like visualvm

image

Funnel/Locking strategy

First, we want to guarantee that within the same process, only one thread executes the sdkmanager installation process (different tasks that do different things with sdkmanager can access it in parallel within the same jvm process), which also relies on checking the filesystem outside of Mill's control (such as the android home directory). This was true for sometime, but it recently became unreliable especially with the shared mode.

Thus, bringing to the second part, which is to support the only one instance of sdkmanager running guarantee across different jvm processes.

Worker

The download responsibility has been moved to an external module - AndroidSdkManagerModule. This encapsulates a worker, with a single threaded pool, which guarantees - in process - only one execution at a time.

To guarantee across jvm processes, a file lock is used in the android home, handled by the worker.

  • Using just a file lock, does not work because the same process overlaps the lock with multiple threads
  • Having a single threaded pool is an easy way to avoid managing in memory locks in addition with file locks.

Shared daemon seems to be behaving non deterministically still

Sometimes I see this failure

image

Weirdly, the file is there. I don't know under what conditions this happens yet.

@vaslabs vaslabs changed the title Fix concurrent android sdk download Android CI: Fix concurrent android sdk download Jan 8, 2026
@vaslabs
Copy link
Contributor Author

vaslabs commented Jan 8, 2026

if this works out, we probably should create some sort of AndroidSdkManagerWorker that handles the installations for all modules (i.e. AndroidSdkModule) . I suspect the issue would exist in other parts but it's probably more stable due to the license check and the ndk is only used in one place in the CI

@vaslabs
Copy link
Contributor Author

vaslabs commented Jan 8, 2026

hm, need to dig deeper

9307] Warning: Observed package id 'tools' in inconsistent location '/home/runner/.android/sdk/tools-2' (Expected '/home/runner/.android/sdk/tools')
9307] Warning: Already observed package id 'tools' in '/home/runner/.android/sdk/tools'. Skipping duplicate at '/home/runner/.android/sdk/tools-2'
9307] Warning: Observed package id 'tools' in inconsistent location '/home/runner/.android/sdk/tools-2' (Expected '/home/runner/.android/sdk/tools')
9307] Warning: Already observed package id 'tools' in '/home/runner/.android/sdk/tools'. Skipping duplicate at '/home/runner/.android/sdk/tools-2'
9307] Warning: Observed package id 'tools' in inconsistent location '/home/runner/.android/sdk/tools-2' (Expected '/home/runner/.android/sdk/tools')
9307] Warning: Already observed package id 'tools' in '/home/runner/.android/sdk/tools'. Skipping duplicate at '/home/runner/.android/sdk/tools-2'

@vaslabs
Copy link
Contributor Author

vaslabs commented Jan 8, 2026

hm, something else is (also?) wrong, for some reason, the downloads get interrupted

@vaslabs
Copy link
Contributor Author

vaslabs commented Jan 9, 2026

thanks @souvlakias

@vaslabs vaslabs marked this pull request as ready for review January 9, 2026 09:31
@lefou
Copy link
Member

lefou commented Jan 9, 2026

I'm not convinced we want a general purpose generic action runner in the context of the worker. Can we instead move the download jobs into the worker and use them via API? The worker then can do whatever is needed to guarantee it's working well.

@vaslabs
Copy link
Contributor Author

vaslabs commented Jan 9, 2026

I'm not convinced we want a general purpose generic action runner in the context of the worker. Can we instead move the download jobs into the worker and use them via API? The worker then can do whatever is needed to guarantee it's working well.

yep, will do it like that, the general purpose was just the quickest way to get there, but I can make the general purpose function private and expose domain methods + get the os calls in, effectively move some of the AndroidSdkModule methods to the worker

@lefou
Copy link
Member

lefou commented Jan 9, 2026

I thought, the AndroidSdkModule is already a singleton and the download happens in a single task, such that there don't happen concurrent downloads to the same location. So either my mental module isn't right, or there are some tasks where we didn't get it right, yet. Or, we manually use alternative AndroidSdkModules pointing to the same target location, and more then one of them.

@vaslabs
Copy link
Contributor Author

vaslabs commented Jan 9, 2026

I thought, the AndroidSdkModule is already a singleton and the download happens in a single task, such that there don't happen concurrent downloads to the same location. So either my mental module isn't right, or there are some tasks where we didn't get it right, yet. Or, we manually use alternative AndroidSdkModules pointing to the same target location, and more then one of them.

yeah, something is weird there, the synchronized didn't seem to work, even if I replaced the file lock with one of them. I think the AndroidSdkModule is not a singleton, it's extended in every example. But even so, I'd expect the private objects could be used as locks, but apparently not.

(Even so the presence of 3 locks could cause a deadlock, which might be one of the reasons the job occasionally times out)

@lefou
Copy link
Member

lefou commented Jan 9, 2026

So, we need to use a single ExternalModule to hold all the download logic (or if you want to use locks, to own the locks). If not we can't ensure the locks are singletons. Even some clever worker logic won't help, if the worker has multiple instances due to being bound to non-singleton AndroidSdkModules.

@vaslabs
Copy link
Contributor Author

vaslabs commented Jan 9, 2026

So, we need to use a single ExternalModule to hold all the download logic (or if you want to use locks, to own the locks). If not we can't ensure the locks are singletons. Even some clever worker logic won't help, if the worker has multiple instances due to being bound to non-singleton AndroidSdkModules.

Ok, I'll give it a go with the external module in the weekend, I was gonna do major work on the file lock approach anyway in the worker, might as well do the external module approach

@vaslabs vaslabs marked this pull request as ready for review January 10, 2026 17:15
Task.log.info(
s"Waiting for sdkmanager lock ($attempts/$maxAttempts)... Trying again in 5 seconds!"
)
Thread.sleep(5000)
Copy link
Contributor Author

@vaslabs vaslabs Jan 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking this could be shorter or an exponential backoff, but , we should arrive here only in cases where multiple mill jvm processes exist, in cases of same process access, it should be handled by the pool (can't enter here twice from the same process anyway). It's probably a mill CI only flow

@vaslabs
Copy link
Contributor Author

vaslabs commented Jan 12, 2026

so far , android tests run consistently green 2 times (2 in my branch, 2 here). The only failure seems to be the same on main branch

@lihaoyi
Copy link
Member

lihaoyi commented Jan 12, 2026

@vaslabs If you're done with this PR I'll merge it

@vaslabs
Copy link
Contributor Author

vaslabs commented Jan 12, 2026

I'm done, I'll keep an eye for improvements as we go along anyway!

@lihaoyi lihaoyi merged commit e3e637d into com-lihaoyi:main Jan 12, 2026
33 of 34 checks passed
@lefou lefou added this to the 1.1.0-RC4 milestone Jan 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

CI: Android test suite is failing since January 2026

4 participants