Skip to content

Conversation

grendello
Copy link
Contributor

@grendello grendello commented Aug 5, 2025

Fixes: #10324
Fixes: #7616
Context: https://docs.oracle.com/en/java/javase/17/docs/specs/jni/invocation.html#library-and-version-management
Context: #7616 (comment)

Java language/virtual machine supports implementing portions
of the API in a language (like C, C++ or Rust) which compiles
into native binary code instead of being JIT-ed or interpreted
at run time. Such implementations are contained in native shared
libraries which have to conform to a set of rules laid out in
the JNI (Java Native Interface) documentation.

Part of the specification describes a function (JNI_OnLoad) which
may be present in the shared library and, if it's there, it will
be called by the JavaVM when the library is loaded. For this to
happen, however, the load must be initiated by calling the
System.loadLibrary(string) Java method. This method will find
the named shared library, load it using an OS-specific mechanism and
then call all the exported functions described in the JNI specification,
if they are present.

Until this PR, .NET for Android (and Xamarin.Android before it) were
loading all the shared libraries in the same way, via dlopen(2) instead
of by using System.loadLibrary(string) which resulted in some of those
libraries not being initialized properly. This PR fixes the issue by
identifying shared libraries which contain the JNI_OnLoad function and
loading them at run time by calling System.loadLibrary(string) instead
of just dlopen(2). This makes it certain that the libraries are properly
initialized.

However, Android environment is quite complex and not everything in the PR
is implement the way it was intended to. The problem lies in the ability of
System.loadLibrary(string) to find the actual shared library file. The file
can be found in a number of locations, among them two application-specific ones:

  • The APK archive's lib/{ABI}/ directory, when shared libraries are not
    extracted from the archive on installation.
  • The application-specific library location on the file system, when shared
    libraries are extracted from the archive on installation.

In either case, the location is not known beforehand as each time the application
is installed, it will get a different path where both its archive and extracted
files are located. This requires the Java runtime to provide that information to
the application in some way. The way ART (the Java runtime on Android) does it is
via class loaders, which are special classes that know how to find and load Java
components as well as the native libraries. System.loadLibrary(string) uses that
information to locate the .so files with JavaVM extensions.

The mechanism described above works well as long as the System.loadLibrary call
is made from a thread that's fully attached to the Java VM, which is to say that
the VM environment sets up the class loaders correctly, so that they contain information
about the application-specific shared library locations. With the correctly configured
class loaders, we can see a similar message when loading the shared library with
System.loadLibrary:

08-13 12:06:48.269 11989 11989 D nativeloader: Load /data/app/~~Xy-UIVle34c_VksRd2_LEg==/com.xamarin.XAPerfTest.net10-GbhwYcau77FAjV_FW1uZwg==/split_config.arm64_v8a.apk!/lib/arm64-v8a/libSystem.Security.Cryptography.Native.Android.so using class loader ns clns-9 (caller=/data/app/~~Xy-UIVle34c_VksRd2_LEg==/com.xamarin.XAPerfTest.net10-GbhwYcau77FAjV_FW1uZwg==/base.apk): ok

The bits to note above are the class loader ns clns-9 information - it's a dynamically
configured loader that is fully informed on application-specific shared library locations
and the name of the caller (cryptic-looking path ending with base.apk).
This loader is used during, for instance, our native runtime configuration - when it is
being intialized from our (Java) package manager at application startup.

However, the problem is that the above class loader is no longer around when we call
System.loadLibrary on a thread that's not fully attached to the Java VM:

08-13 12:16:10.659 12472 12472 D nativeloader: Load libSystem.Security.Cryptography.Native.Android.so using system ns (caller=/system/framework/framework.jar!classes3.dex): dlopen failed: library "libSystem.Security.Cryptography.Native.Android.so" not found

In this case note that both the class loader (named here just system ns) and the
caller are generic, they have no knowledge of the application-specific shared library
locations.

This observation lead to the idea of using the native looper (ALooper) interface
to post the shared library load request to the main thread from native code, and then
call System.loadLibrary on it. This assumed that the main thread, which originally had
the application-specific class loader, would still be around and able to handle the
load properly. Unfortunately, this doesn't appear to be the case. Despite us attaching
to the Java thread with JNI API (AttachCurrentThread), the application-specific
class loader isn't there. This was discovered a few years ago already (see the link to
issue #7616 comment) but we haven't been able to find a way to fully attach the thread
to the Java VM so that the class loaders are correctly set up.

This, unfortunately, leads us to our only remaining option - preloading of the JNI libraries
at application started, while we're still in the properly configured main thread.

This PR implements just that, but it also implements and uses code to load JNI shared libraries
on-demand from any thread by posting the request to the main thread, as it may just happen
that a request to load a shared library will happen on a separate managed thread during
startup and we might get lucky to run the loading code on a still-attached main thread.

In the future more work is required (much more) to investigate the internals of the ART
runtime in order to try to find a way to fully attach managed threads so that class loaders
are set up properly.

@grendello grendello force-pushed the dev/grendel/jni-dso-load branch from ebe9744 to 96bbe92 Compare August 7, 2025 07:25
@grendello grendello marked this pull request as ready for review August 7, 2025 14:30
Comment on lines 17 to 33
[[gnu::flatten]]
static void init (JNIEnv *env, jclass systemClass)
{
jni_env = env;
systemKlass = systemClass;
System_loadLibrary = env->GetStaticMethodID (systemClass, "loadLibrary", "(Ljava/lang/String;)V");
if (System_loadLibrary == nullptr) [[unlikely]] {
Helpers::abort_application ("Failed to look up the Java System.loadLibrary method.");
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For my understanding, when would you decide to move the implementation to a dso-loader.cc file? Instead of putting all the code in the .hh file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would move it if the code was complex and/or involved a lot of dependencies (include files). I prefer to inline as much as possible by default, to squeeze out every ounce of performance at run time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, this particular code is not called from many places, so size increase in the resulting binary isn't bad at all.

@grendello grendello force-pushed the dev/grendel/jni-dso-load branch from e873db7 to cd2a310 Compare August 8, 2025 15:35
@grendello grendello marked this pull request as draft August 11, 2025 07:18
@grendello grendello force-pushed the dev/grendel/jni-dso-load branch from cd2a310 to be52def Compare August 11, 2025 10:03
@grendello
Copy link
Contributor Author

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@grendello
Copy link
Contributor Author

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@grendello grendello force-pushed the dev/grendel/jni-dso-load branch from 93e4028 to 4b16be1 Compare August 13, 2025 10:42
@grendello
Copy link
Contributor Author

Unfortunately, due to bad Android design, loading the shared library on a thread other than the main one fails:

08-13 12:16:10.657 12472 12507 D monodroid-assembly: Trying to load loading shared JNI library /data/user/0/Mono.Android.NET_Tests/files/.__override__/arm64-v8a/libSystem.Security.Cryptography.Native.Android.so with System.loadLibrary
08-13 12:16:10.657 12472 12507 D monodroid-assembly: Running DSO loader on thread 12507, dispatching to main thread
08-13 12:16:10.657 12472 12472 D monodroid-assembly: Looper CB called on thread 12472. Will attempt to load DSO 'System.Security.Cryptography.Native.Android'
08-13 12:16:10.657 12472 12472 D monodroid-assembly: Undecorated library name: System.Security.Cryptography.Native.Android
08-13 12:16:10.659 12472 12472 D nativeloader: Load libSystem.Security.Cryptography.Native.Android.so using system ns (caller=/system/framework/framework.jar!classes3.dex): dlopen failed: library "libSystem.Security.Cryptography.Native.Android.so" not found
08-13 12:16:10.659 12472 12472 D monodroid-assembly: System.loadLibrary threw a Java exception. Will attempt to log it.
08-13 12:16:10.661 12472 12472 W System.err: java.lang.UnsatisfiedLinkError: dlopen failed: library
"libSystem.Security.Cryptography.Native.Android.so" not found

The problem here is that the class loader being used (system ns) doesn't know about the application shared libary paths, so dlopen eventually called by System.loadLibrary cannot find them. The class loader which knows that (it would be called e.g. class loader ns clns-9) isn't apparently part of the JNI environment nor is inherited by non-Java threads (which is the case here, we're being called on a managed thread). Attaching the thread to JNI/JVM doesn't set it up properly either.

I'm afraid we'll have no option but to preload all the JNI shared libraries at application startup, thus wasting potentially a lot of precious startup time.

@grendello grendello force-pushed the dev/grendel/jni-dso-load branch 3 times, most recently from f072092 to e47a210 Compare August 18, 2025 08:34
@grendello grendello marked this pull request as ready for review August 18, 2025 09:12
@grendello grendello force-pushed the dev/grendel/jni-dso-load branch from e47a210 to 6c4d790 Compare August 18, 2025 15:09
No matter what I try, the call posted to the main thread still doesn't use
the right class loader and, thus, `System.loadLibrary` cannot find the shared lib:

```
08-13 12:16:10.657 12472 12507 D monodroid-assembly: Trying to load loading shared JNI library /data/user/0/Mono.Android.NET_Tests/files/.__override__/arm64-v8a/libSystem.Security.Cryptography.Native.Android.so with System.loadLibrary
08-13 12:16:10.657 12472 12507 D monodroid-assembly: Running DSO loader on thread 12507, dispatching to main thread
08-13 12:16:10.657 12472 12472 D monodroid-assembly: Looper CB called on thread 12472. Will attempt to load DSO 'System.Security.Cryptography.Native.Android'
08-13 12:16:10.657 12472 12472 D monodroid-assembly: Undecorated library name: System.Security.Cryptography.Native.Android
08-13 12:16:10.659 12472 12472 D nativeloader: Load libSystem.Security.Cryptography.Native.Android.so using system ns (caller=/system/framework/framework.jar!classes3.dex): dlopen failed: library "libSystem.Security.Cryptography.Native.Android.so" not found
08-13 12:16:10.659 12472 12472 D monodroid-assembly: System.loadLibrary threw a Java exception. Will attempt to log it.
08-13 12:16:10.661 12472 12472 W System.err: java.lang.UnsatisfiedLinkError: dlopen failed: library
"libSystem.Security.Cryptography.Native.Android.so" not found
```

Time to think about something else :( Thank you Android for 3 days wasted
They might be causing this error:

```
08-18 09:32:06.603  5693  5718 I monodroid: Loaded type: Java.Security.Cert.X509Certificate
08-18 09:32:06.603  5693  5718 E droid.NET_Test: JNI ERROR (app bug): accessed deleted Global 0x3a62
08-18 09:32:06.603  5693  5718 F droid.NET_Test: java_vm_ext.cc:570] JNI DETECTED ERROR IN APPLICATION: use of deleted global reference 0x3a62
08-18 09:32:06.604  5693  5718 F droid.NET_Test: java_vm_ext.cc:570]     from void crc643df67da7b13bb6b1.TestInstrumentation_1.n_onStart()
```
@grendello grendello force-pushed the dev/grendel/jni-dso-load branch from 6c4d790 to af20339 Compare August 18, 2025 15:11
@grendello grendello merged commit cba39dc into main Aug 19, 2025
59 checks passed
@grendello grendello deleted the dev/grendel/jni-dso-load branch August 19, 2025 07:21
jonathanpeppers pushed a commit that referenced this pull request Aug 19, 2025
Fixes: #10324
Fixes: #7616
Context: https://docs.oracle.com/en/java/javase/17/docs/specs/jni/invocation.html#library-and-version-management
Context: #7616 (comment)

Java language/virtual machine supports implementing portions
of the API in a language (like C, C++ or Rust) which compiles
into native binary code instead of being JIT-ed or interpreted
at run time. Such implementations are contained in native shared
libraries which have to conform to a set of rules laid out in
the JNI (Java Native Interface) documentation.

Part of the specification describes a function (`JNI_OnLoad`) which
may be present in the shared library and, if it's there, it will
be called by the JavaVM when the library is loaded. For this to
happen, however, the load must be initiated by calling the
`System.loadLibrary(string)` Java method. This method will find
the named shared library, load it using an OS-specific mechanism and
then call all the exported functions described in the JNI specification,
if they are present.

Until this PR, .NET for Android (and Xamarin.Android before it) were
loading all the shared libraries in the same way, via `dlopen(2)` instead
of by using `System.loadLibrary(string)` which resulted in some of those
libraries not being initialized properly. This PR fixes the issue by
identifying shared libraries which contain the `JNI_OnLoad` function and
loading them at run time by calling `System.loadLibrary(string)` instead
of just `dlopen(2)`. This makes it certain that the libraries are properly
initialized.

However, Android environment is quite complex and not everything in the PR
is implement the way it was intended to. The problem lies in the ability of
`System.loadLibrary(string)` to find the actual shared library file. The file
can be found in a number of locations, among them two application-specific ones:

  * The APK archive's `lib/{ABI}/` directory, when shared libraries are not
    extracted from the archive on installation.
  * The application-specific library location on the file system, when shared
    libraries are extracted from the archive on installation.

In either case, the location is not known beforehand as each time the application
is installed, it will get a different path where both its archive and extracted
files are located. This requires the Java runtime to provide that information to
the application in some way. The way ART (the Java runtime on Android) does it is
via class loaders, which are special classes that know how to find and load Java
components as well as the native libraries. `System.loadLibrary(string)` uses that
information to locate the .so files with JavaVM extensions.

The mechanism described above works well as long as the `System.loadLibrary` call
is made from a thread that's fully attached to the Java VM, which is to say that
the VM environment sets up the class loaders correctly, so that they contain information
about the application-specific shared library locations. With the correctly configured
class loaders, we can see a similar message when loading the shared library with
`System.loadLibrary`:

```
08-13 12:06:48.269 11989 11989 D nativeloader: Load /data/app/~~Xy-UIVle34c_VksRd2_LEg==/com.xamarin.XAPerfTest.net10-GbhwYcau77FAjV_FW1uZwg==/split_config.arm64_v8a.apk!/lib/arm64-v8a/libSystem.Security.Cryptography.Native.Android.so using class loader ns clns-9 (caller=/data/app/~~Xy-UIVle34c_VksRd2_LEg==/com.xamarin.XAPerfTest.net10-GbhwYcau77FAjV_FW1uZwg==/base.apk): ok
```

The bits to note above are the `class loader ns clns-9` information - it's a dynamically
configured loader that is fully informed on application-specific shared library locations
and the name of the caller (cryptic-looking path ending with `base.apk`).
This loader is used during, for instance, our native runtime configuration - when it is
being intialized from our (Java) package manager at application startup.

However, the problem is that the above class loader is no longer around when we call
`System.loadLibrary` on a thread that's not fully attached to the Java VM:

```
08-13 12:16:10.659 12472 12472 D nativeloader: Load libSystem.Security.Cryptography.Native.Android.so using system ns (caller=/system/framework/framework.jar!classes3.dex): dlopen failed: library "libSystem.Security.Cryptography.Native.Android.so" not found
```

In this case note that both the class loader (named here just `system ns`) and the
caller are generic, they have no knowledge of the application-specific shared library
locations.

This observation lead to the idea of using the native looper (`ALooper`) interface
to post the shared library load request to the main thread from native code, and then
call `System.loadLibrary` on it. This assumed that the main thread, which originally had
the application-specific class loader, would still be around and able to handle the
load properly. Unfortunately, this doesn't appear to be the case. Despite us attaching
to the Java thread with JNI API (`AttachCurrentThread`), the application-specific
class loader isn't there. This was discovered a few years ago already (see the link to
issue #7616 comment) but we haven't been able to find a way to fully attach the thread
to the Java VM so that the class loaders are correctly set up.

This, unfortunately, leads us to our only remaining option - preloading of the JNI libraries
at application started, while we're still in the properly configured main thread.

This PR implements just that, but it also implements and uses code to load JNI shared libraries
on-demand from any thread by posting the request to the main thread, as it may just happen
that a request to load a shared library will happen on a separate managed thread during
startup and we might get lucky to run the loading code on a still-attached main thread.

In the future more work is required (much more) to investigate the internals of the ART
runtime in order to try to find a way to fully attach managed threads so that class loaders
are set up properly.
jonathanpeppers added a commit that referenced this pull request Aug 22, 2025
This reverts commit 064f23f.

We are seeing `dotnet new maui` projects fail to debug, as they appear
to be stuck in a loop on startup:

    08-22 11:39:14.148 W/monodroid-assembly(10269): Timeout while waiting for shared library '/data/user/0/com.companyname.mauiapp14/files/.__override__/arm64-v8a/libSystem.Security.Cryptography.Native.Android.so' to load.
    08-22 11:39:17.155 W/monodroid-assembly(10269): Timeout while waiting for shared library '/data/app/~~Q5yyfDmzDqX9Z8UwQnLoFA==/com.companyname.mauiapp14-ZPk2_y6fT3b_3leM8xhcAw==/lib/arm64/libSystem.Security.Cryptography.Native.Android.so' to load.
    08-22 11:39:20.164 W/monodroid-assembly(10269): Timeout while waiting for shared library '/data/user/0/com.companyname.mauiapp14/files/.__override__/arm64-v8a/libSystem.Security.Cryptography.Native.Android' to load.
    08-22 11:39:23.172 W/monodroid-assembly(10269): Timeout while waiting for shared library '/data/app/~~Q5yyfDmzDqX9Z8UwQnLoFA==/com.companyname.mauiapp14-ZPk2_y6fT3b_3leM8xhcAw==/lib/arm64/libSystem.Security.Cryptography.Native.Android' to load.
    08-22 11:39:23.172 W/monodroid-assembly(10269): Shared library 'libSystem.Security.Cryptography.Native.Android' not loaded, p/invoke 'AndroidCryptoNative_SSLStreamInitialize' may fail
    08-22 11:39:23.172 F/monodroid-assembly(10269): Failed to load symbol 'AndroidCryptoNative_SSLStreamInitialize' from shared library 'libSystem.Security.Cryptography.Native.Android'
    08-22 11:39:26.187 W/monodroid-assembly(10269): Timeout while waiting for shared library '/data/user/0/com.companyname.mauiapp14/files/.__override__/arm64-v8a/libSystem.Security.Cryptography.Native.Android.so' to load.
    08-22 11:39:29.190 W/monodroid-assembly(10269): Timeout while waiting for shared library '/data/app/~~Q5yyfDmzDqX9Z8UwQnLoFA==/com.companyname.mauiapp14-ZPk2_y6fT3b_3leM8xhcAw==/lib/arm64/libSystem.Security.Cryptography.Native.Android.so' to load.
    08-22 11:39:32.198 W/monodroid-assembly(10269): Timeout while waiting for shared library '/data/user/0/com.companyname.mauiapp14/files/.__override__/arm64-v8a/libSystem.Security.Cryptography.Native.Android' to load.
    repeating...
grendello added a commit that referenced this pull request Aug 28, 2025
…oo (#10444)

Context: cba39dc
Context: #10376
Context: #10324

cba39dc implemented preloading of JNI-using native libraries but it missed to update
alias entries for each preloaded library.

During application build we generate native code that contains cache for each native
library packaged with the managed application code. Each library follows the same
naming pattern: `lib<NAME>.so`. However, the managed code can refer to those libraries
(when e.g. declaring a p/invoke with the `[DllImporrt]` attribute) using different forms
of names. The request may have a form of `lib<NAME>` or `<NAME>` etc. 

When the runtime tries to resolve the p/invoke symbol, it first needs to load the shared
library. This is done (in our case) by using a callback into our runtime which then tries
to find the library and load it. Should the attempt fail, the runtime will mutate the
library name and ask as again until all the possible names are tried or the library is
loaded successfully. This roundtrip is pretty expensive, so in our native library loader
code we implemented (in c227042) a scheme where at build time we mutate library names
ourselves and a separate entry for each name mutation in the shared library cache. This
way, when the runtime request comes, we perform a single search and are able to find
the library no matter what name the managed code requested.

Each of the cache entries contains, among other things irrelevant to this PR, a field
which stores the native library's handle, after it is loaded. cba39dc loaded the
library and set that field in just a single cache entry, the one corresponding to the
canonical library name (`lib<NAME>.so`) but it failed to set the field in all the aliases.
This resulted in an attempt to load the library again, with the managed code requesting it
by a different name, finding the corresponding cache entry and seeing that its handle is
unset. However, since the request was sent from a different thread, we attempted to load
the library on the main thread (described in detail in cba39dc commit message), which
attempt always failed leading to an endless loop and application crash/hang while debugging.

Fix the issue by setting native shared library handle in all the cache entries corresponding
to various mutations of the library name. This makes sure that further requests to load the
library will see the handle set in cache and use it, instead of attempting to load the it
again.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[.NET 10 / CoreCLR] Calling crypto hash APIs causes NRE in native code AndroidNativeLibrary doesn't trigged JNI_OnLoad
2 participants