Skip to content

Conversation

@zeitlinger
Copy link
Member

Fixes #12812

@zeitlinger zeitlinger requested a review from a team as a code owner January 21, 2025 13:21
@zeitlinger zeitlinger self-assigned this Jan 21, 2025
@github-actions github-actions bot added the test native This label can be applied to PRs to trigger them to run native tests label Jan 21, 2025
@zeitlinger
Copy link
Member Author

@jeanbisutti can you help me with the native failures:

  1. not sure in this is transient:
Failures (1):
  JUnit Jupiter:OtelSpringStarterSmokeTest:shouldSendTelemetry()
    MethodSource [className = 'io.opentelemetry.spring.smoketest.OtelSpringStarterSmokeTest', methodName = 'shouldSendTelemetry', methodParameterTypes = '']
    => org.awaitility.core.ConditionTimeoutException: Assertion condition defined as a Lambda expression in io.opentelemetry.instrumentation.testing.InstrumentationTestRunner
Expecting actual not to be empty within 10 seconds.
       org.awaitility.core.ConditionAwaiter.await(ConditionAwaiter.java:167)
       org.awaitility.core.AssertionCondition.await(AssertionCondition.java:119)
       org.awaitility.core.AssertionCondition.await(AssertionCondition.java:31)
       org.awaitility.core.ConditionFactory.until(ConditionFactory.java:1006)
       org.awaitility.core.ConditionFactory.untilAsserted(ConditionFactory.java:790)
       [...]
     Caused by: org.awaitility.core.DeadlockException: Deadlocked threads detected:


       org.awaitility.core.ConditionAwaiter.await(ConditionAwaiter.java:159)
       [...]
  1. numLogsCapturedBeforeOtelInstall value of the OpenTelemetry appender is too small. - should we increase the buffer?

  2. thread started: this if for JFR - I'll try @PreDestry for this

The web application [ROOT] appears to have started a thread named [BatchLogRecordProcessor_WorkerThread-1] but has failed to stop it. This is very likely to create a memory leak. Stack trace of thread:
 org.graalvm.nativeimage.builder/com.oracle.svm.core.posix.headers.Pthread.pthread_cond_timedwait(Pthread.java)
 org.graalvm.nativeimage.builder/com.oracle.svm.core.posix.thread.PosixParker.park0(PosixPlatformThreads.java:379)
 org.graalvm.nativeimage.builder/com.oracle.svm.core.posix.thread.PosixParker.park(PosixPlatformThreads.java:354)
 org.graalvm.nativeimage.builder/com.oracle.svm.core.thread.PlatformThreads.parkCurrentPlatformOrCarrierThread(PlatformThreads.java:1001)
 [email protected]/jdk.internal.misc.Unsafe.park(Unsafe.java:56)
 [email protected]/java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:269)
 [email protected]/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:1763)
 [email protected]/java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:435)
 io.opentelemetry.sdk.logs.export.BatchLogRecordProcessor$Worker.run(BatchLogRecordProcessor.java:246)
 [email protected]/java.lang.Thread.runWith(Thread.java:1596)
 [email protected]/java.lang.Thread.run(Thread.java:1583)

@jeanbisutti
Copy link
Member

@zeitlinger About 1., it seems an awaitility issue. Does the problem only appear with the new changes? Perhaps it may be worth to do something like

But I am not sure today it would be a good thing to do. It would require some further investigations.

About 2., numLogsCapturedBeforeOtelInstall default value is high: 1 000 logs. I suspect that the warning is related to something specific to the test.

About 3., it seems related to Tomcat searching memory leaks. With the full log we could know if it really comes from Tomcat. It does not seem possible to stop the BatchLogRecordProcessor thread. Surprised it could be JFR related. @jack-berg, would you know if some users have already reported the following log?

appears to have started a thread named [BatchLogRecordProcessor_WorkerThread-1] but has failed to stop it.

Native tests of this PR are failing during the native compilation step:

[native-image-plugin] Native Image written to: /home/runner/work/opentelemetry-java-instrumentation/opentelemetry-java-instrumentation/smoke-tests-otel-starter/spring-boot-3.2/build/native/nativeTestCompile

[Incubating] Problems report is available at: file:///home/runner/work/opentelemetry-java-instrumentation/opentelemetry-java-instrumentation/build/reports/problems/problems-report.html

I would try to focus on the JMX or JFR metrics for a GraalVM native execution. GraalVM supports some JFR events, but not all of them. So, not sure that all the JFR metrics can work today in the native mode.

@jack-berg
Copy link
Member

@jack-berg, would you know if some users have already reported the following log?

I haven't seen that log before.

@zeitlinger
Copy link
Member Author

we also have com.oracle.svm.core.jdk.UnsupportedFeatureError: ThreadMXBean methods - see oracle/graal#6101

@zeitlinger zeitlinger force-pushed the spring-boot-runtime-metrics branch from c0de41a to b3c0f10 Compare January 28, 2025 08:25
@zeitlinger
Copy link
Member Author

@jeanbisutti turned out that all prior errors were just a side effect of a jmx issue which is resolved now

can you take a look again?

@jeanbisutti
Copy link
Member

@roberttoyonaga If you have time, it would be great if you could have a look at this PR.

private static boolean useThreads() {
// GraalVM native image does not support ThreadMXBean yet
// see https://github.com/oracle/graal/issues/6101
return !isJava9OrNewer() || System.getProperty("org.graalvm.nativeimage.imagecode") != null;
Copy link
Contributor

@roberttoyonaga roberttoyonaga Jan 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that GraalVM Native Image would work fine with Threads::java8Callback. Native Image implements some ThreadMXBean functionality, notably all the functionality needed by Threads::java8Callback (see here and here). Although, Threads::java9AndNewerCallback still won't work with Native Image since we don't support ThreadMXBean#getAllThreadIds() or ThreadMXBean.getThreadInfo() yet.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried JMX - and ran into #13078 (comment)

@zeitlinger zeitlinger force-pushed the spring-boot-runtime-metrics branch from a130e3d to e057797 Compare January 30, 2025 12:12
@zeitlinger
Copy link
Member Author

@jeanbisutti now I get this - do you have an idea?

Error: Value not yet available for AnalysisField<Enum.hash -> HotSpotResolvedJavaFieldImpl<java.lang.Enum.hash int:16>, accessed: true, read: true, written: true, folded: true>
Error: Value not yet available for AnalysisField<Enum.name -> HotSpotResolvedJavaFieldImpl<java.lang.Enum.name String:20>, accessed: true, read: true, written: true, folded: true>
Error: Value not yet available for AnalysisField<Enum.ordinal -> HotSpotResolvedJavaFieldImpl<java.lang.Enum.ordinal int:12>, accessed: true, read: true, written: true, folded: true>

Maybe we should split this PR to leave out naive image initially - and then add native image incrementally (JFR as last step)

wdyt?

@jeanbisutti
Copy link
Member

These error messages are not very detailed ones 😅

You could create a first PR with only what works today with a JVM and a native image application.

Another option may be to exclude the RuntimeMetricsAutoConfiguration for native image applications. It's perhaps possible to exclude this autoconfiguration during the Spring AOT processing step. It might be possible by implementing the BeanRegistrationExcludeFilter interface and referencing the implementation in a org.springframework.beans.factory.aot.BeanRegistrationExcludeFilter property of the aot.factories file.

Example:

@jeanbisutti
Copy link
Member

@zeitlinger Do you agree to close this one?

@zeitlinger
Copy link
Member Author

Yes

@zeitlinger
Copy link
Member Author

closing in favor of #13173

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

test native This label can be applied to PRs to trigger them to run native tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add runtime-telemetry to spring starter

4 participants