-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Open
Labels
:Core/Infra/MetricsMetrics and metering infrastructureMetrics and metering infrastructure>enhancementMetaTeam:Core/InfraMeta label for core/infra teamMeta label for core/infra team
Description
Description
elasticsearch is using apm-java-agent as the underlying implementation in the apm module.we are using our own apm api, implemented in apm-module with OTEL api. This should not change.
What should change is the binding between otel api and the implementation. Which should be otel sdk. Otel SDK will allow us to get more flexibility on configuring how our metrics and traces are sent to apm server (apm server support otel sdk).
With Otel sdk we will be able to implement features like 'tee-ing' (splitting to two apm server) of the export or some additional buffering, retries when apm server is overloaded.
I worked on a simple very dirty PoC where this proves to work #110263
Things that need more investigation and work:
- 1. configuring the otel sdk with the apm-java-agent settings (things like metrics_interval, transactioN-sample_rate, server_url etc)
- 2. rework of RunTask - for localhost testing 73c0e5f
-
3. rework of logging - otel sdk is using JUL logging. We already have a bridge in server https://github.com/elastic/elasticsearch/blob/main/server/src/main/java/org/elasticsearch/common/logging/JULBridge.java If we refactor it into a lib it should be possible to use it in the module. I wasn't able to make it work without adding a JUL -log4j bridge dependency (which we don't want) but at the same time I was rushing it.. We want to make sure we use the same apm_agent.json file (probably not worth renaming). This is a good example where plugin's might want to add an appender. - 4. careful review of the dependencies. otel sdk requires quite a lot, some of the dependencies like okhttp are not working with java9's modules. some are introducing a 'clash version' dependencies (netty is already a dependency of server). It all shouldn't be a problem if we hide the implementation behind the embedded classloader like we do for x-content and jackson
- 5. due to use of java beans the java9's module require a java.desktop. This feels awkward, but I am not sure how to go around it.
- 6. OTEL sdk buildAndRegister can only be called once. If it is called twice and exception will be thrown. We need to make sure that starting/stopping the metering (this is possible now) will not throw this exception.
- 7. apm-java-agent gives us a bunch of out of the box metrics for the jvm. I copied the JvmJdMetrics from apm-java-agent repo. Perhaps we need to work with apm team to have this as a lib? Just copying the JvmFdMetrics, JvmGcMetrics, JvmMemoryMetrics could work initially, but feels dirty. The naming there has to also comply with our naming convention (we would register them using our with own api)
- 8. BIG - review the security manager permission. It would be a relatively tedious and long task, as there is a loot of new dependencies. For the PoC I have disabled the security manager
- 9. New Exporters - we could simply configure the out of the box available exporters (simply adding 2 for the support of exporting to 2 apm servers) or implement our own so that we have more control of logging etc
- 10. otel sdk exporters support http and grpc protocoles. APM server works with both. Need to decide on one.
pmoust
Metadata
Metadata
Assignees
Labels
:Core/Infra/MetricsMetrics and metering infrastructureMetrics and metering infrastructure>enhancementMetaTeam:Core/InfraMeta label for core/infra teamMeta label for core/infra team