-
Notifications
You must be signed in to change notification settings - Fork 769
SOLR-17855: Cleanup Dropwizard metric code and tests #3745
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SOLR-17855: Cleanup Dropwizard metric code and tests #3745
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dsmiley Big change here to remove a lot of dropwizard stuff. What I am mostly not sure about here is if the things I removed should have been removed or not so to have that double checked would be good. All tests were passing locally at this point. Sadly I found a few more things that were broken and need to be migrated or fixed:
- SystemInfoHandler
- Admin UI Node view? I think it got host names or metrics on the UI for that
- Node aggregation metrics (PR is open already)
Once those things are done with this PR, the branch migration should be mostly done besides ref-guide re-writes.
* @param parentContext The registry that the component will initialize metrics to | ||
* @param attributes Base set of attributes that will be bound to all metrics for that component | ||
*/ | ||
void initializeMetrics(SolrMetricsContext parentContext, Attributes attributes, String scope); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
scope
String gone
// NOCOMMIT: SOLR-17865 | ||
// this.solrMetricsContext = | ||
// new SolrDelegateRegistryMetricsContext( | ||
// parentContext.getMetricManager(), | ||
// parentContext.getRegistryName(), | ||
// SolrMetricProducer.getUniqueMetricTag(this, parentContext.getTag()), | ||
// SolrMetricManager.getRegistryName(SolrInfoBean.Group.node)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to fix this as part of this PR. Might be more duplicate code which might end up looking very similar to requestHandlerBase
this.requestTimer = | ||
new AttributedLongTimer( | ||
solrMetricsContext.longHistogram( | ||
"http_client_request_duration", | ||
"HTTP client request duration", | ||
OtelUnit.MILLISECONDS), | ||
attributes); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just found this metric wasn't migrated. Need to prefix the metric with solr_
and what should the name of this be?
// NOCOMMIT: The nodes view might be broken because of this | ||
@Test | ||
@AwaitsFix(bugUrl = "https://issues.apache.org/jira/browse/SOLR-8207") | ||
public void proxyMetricsHandlerAllNodes() throws IOException, SolrServerException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The admin UI might be broken based on this test. I need to come back to it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I was looking at this and the UI and it's not broken? I am not good at Solr JS admin ui but I don't think it was using /admin/metrics for this but instead /admin/info/system handler. Turns out I also fixed that node view when I committed 7daafff going back and forth between the commits basically killing two birds with one stone. I am thinking of removing this test if no objections unless I am missing something here I don't know about the UI.
solr/core/src/test/org/apache/solr/handler/admin/SystemInfoHandlerTest.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
damn that was a lot of work!
cybozulabs-langdetect = { module = "com.cybozu.labs:langdetect", version.ref = "cybozulabs-langdetect" } | ||
decompose-decompose = { module = "com.arkivanov.decompose:decompose", version.ref = "decompose" } | ||
decompose-extensions-compose = { module = "com.arkivanov.decompose:extensions-compose", version.ref = "decompose" } | ||
dropwizard-metrics-core = { module = "io.dropwizard.metrics:metrics-core", version.ref = "dropwizard-metrics" } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still see dropwizard dependencies
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Forgot to mention this. I didn't remove it completely because a few things were using the instruments but not necessarily the registry for the metrics collection beside CrossDcConsumer
(I don't know much about this module).
- Stats was being used for overseer and Zk distributed queue and this was also being used by embedded zk metrics. It worked with the
OverseerStatusCmd
API to just get some averages and rates from the instruments. - MiniClusterState was using the meter instrument for logging somethings
- CrossDcConsumer was creating its own registry it looks like for collecting metrics
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay; looks like another JIRA I guess. At least doesn't seem like a release blocker.
<!-- Consult the javadoc of o.e.j.util.thread.QueuedThreadPool --> | ||
<!-- for all configuration that may be set here. --> | ||
<!-- =========================================================== --> | ||
<Arg name="threadPool"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm; I wonder if Solr starts -- i.e. do any BATS integration tests work
|
||
private void initializeMetrics(SolrMetricsContext parentContext) { | ||
var solrMetricsContext = parentContext.getChildContext(this); | ||
String expandedScope = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
now not needed
Boolean.getBoolean("solr.disableRequestId"); | ||
|
||
private HandlerMetrics metricsShard = HandlerMetrics.NO_OP; | ||
private final Map<String, Counter> shardPurposes = new ConcurrentHashMap<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
huh.... I know of this thing and it's surprising that we don't actually consume from it. Well good riddance then.
solr/core/src/java/org/apache/solr/util/stats/InstrumentedHttpListenerFactory.java
Outdated
Show resolved
Hide resolved
TimeUnit.SECONDS, | ||
(n, c) -> DocCollection.isFullyActive(n, c, 2, 1)); | ||
|
||
fiveHundredsByNode = new LinkedHashMap<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assertNo500s
uses that
solr/core/src/test/org/apache/solr/handler/admin/SystemInfoHandlerTest.java
Show resolved
Hide resolved
response = handleDropwizardRegistry(params); | ||
|
||
consumer.accept("metrics", response); | ||
handlePrometheusRequest(params, consumer); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing I couldn't figure out here is how do we default the response to the PrometheusResponseWriter without wt
? Other option I was thinking is we require the wt parameter with either prometheus
or openmetrics
and throw an exception otherwise. This would keep it verbose so you know if which format you are reading.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's okay to default to one or the other.
Note eventually we should consider the Accepts header but I wouldn't worry about that now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will probably default to prometheus in that case then. My question though was actually how do you implement this to go to the PrometheusResponseWriter without wt
parameter since it seems to always default to the JacksonJsonWriter otherwise?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ugh. Looks like you need to set the params on the request to set WT. Try SolrParams.wrapDefaults(orig,SolrParams.of("wt", "prometheus"))
consumer.accept(name, (MapWriter) ew -> ew.putNoEx("count", counter.getCount())); | ||
} | ||
} | ||
public static void addMetrics(NamedList<Object> lst, Timer timer) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You moved this from its former position. Can you put it back unless there's a reason for the move?
* @param consumer consumer for created names and metrics | ||
* @param <T> formal type | ||
*/ | ||
public static <T extends PlatformManagedObject> void addMXBeanMetrics( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The javadocs of this method are no longer true. Maybe even the name needs adjustment... perhaps foreachGetterValue?
It's not even metrics related at all; maybe it should move to its only caller in SystemInfoHandler
* @param consumer consumer for created names and metrics | ||
* @param <T> formal type | ||
*/ | ||
public static <T extends PlatformManagedObject> void addMXBeanMetrics( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And likewise the name and javadocs are no longer accurate
cybozulabs-langdetect = { module = "com.cybozu.labs:langdetect", version.ref = "cybozulabs-langdetect" } | ||
decompose-decompose = { module = "com.arkivanov.decompose:decompose", version.ref = "decompose" } | ||
decompose-extensions-compose = { module = "com.arkivanov.decompose:extensions-compose", version.ref = "decompose" } | ||
dropwizard-metrics-core = { module = "io.dropwizard.metrics:metrics-core", version.ref = "dropwizard-metrics" } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay; looks like another JIRA I guess. At least doesn't seem like a release blocker.
String format = params.get(CommonParams.WT); | ||
|
||
if (format == null) { | ||
req.setParams(SolrParams.wrapDefaults(params, SolrParams.of("wt", "prometheus"))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now defaulting to prometheus
* sends them periodically as update requests to a selected Solr collection and to a configured | ||
* handler. | ||
*/ | ||
public class SolrReporter extends ScheduledReporter { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dsmiley Just to respond to the JIRA, here is where I removed the reporter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice. Will merge tonight.
Actually just realized I broke a few tests because some are trying to hit /admin/metrics endpoint but throw my new exception because it requested in javabin. I didn't look too deep but if you want to merge it I can fix it in a separate PR when I get back. |
Gah I see. All the splitshard API tests is what started failing because of this. Was techinically always failing, but if metrics weren't reached, it just proceeded without fail and logged a warning. Now we throw an exception which does fail it. Other test is this which isn't too bad. Just add wt=prometheus. Once those are fixed we should be good 🤞. |
… fork/mlbiscoc/SOLR-17855-dropwizard-to-otel # NODE_REGISTRY not getRegistryName # Conflicts: # solr/core/src/java/org/apache/solr/handler/RequestHandlerBase.java # solr/core/src/java/org/apache/solr/update/DirectUpdateHandler2.java # solr/core/src/java/org/apache/solr/update/SolrIndexWriter.java
https://issues.apache.org/jira/browse/SOLR-17855
This branch changes a number of things:
scope
String parameter from initializeMetricsunregister()
like how Dropwizard callback gauges were closed