Skip to content

Conversation

mlbiscoc
Copy link
Contributor

@mlbiscoc mlbiscoc commented Oct 9, 2025

https://issues.apache.org/jira/browse/SOLR-17855

This branch changes a number of things:

  • Remove unused Dropwizard metric dependencies
  • Remove SolrReporters
  • Remove scope String parameter from initializeMetrics
  • Cleanup a number of NOCOMMITs and broken tests
  • Collect observables to the corresponding SolrMetricContext to close on unregister() like how Dropwizard callback gauges were closed

Copy link
Contributor Author

@mlbiscoc mlbiscoc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dsmiley Big change here to remove a lot of dropwizard stuff. What I am mostly not sure about here is if the things I removed should have been removed or not so to have that double checked would be good. All tests were passing locally at this point. Sadly I found a few more things that were broken and need to be migrated or fixed:

  1. SystemInfoHandler
  2. Admin UI Node view? I think it got host names or metrics on the UI for that
  3. Node aggregation metrics (PR is open already)

Once those things are done with this PR, the branch migration should be mostly done besides ref-guide re-writes.

* @param parentContext The registry that the component will initialize metrics to
* @param attributes Base set of attributes that will be bound to all metrics for that component
*/
void initializeMetrics(SolrMetricsContext parentContext, Attributes attributes, String scope);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

scope String gone

Comment on lines 227 to 233
// NOCOMMIT: SOLR-17865
// this.solrMetricsContext =
// new SolrDelegateRegistryMetricsContext(
// parentContext.getMetricManager(),
// parentContext.getRegistryName(),
// SolrMetricProducer.getUniqueMetricTag(this, parentContext.getTag()),
// SolrMetricManager.getRegistryName(SolrInfoBean.Group.node));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to fix this as part of this PR. Might be more duplicate code which might end up looking very similar to requestHandlerBase

Comment on lines 129 to 135
this.requestTimer =
new AttributedLongTimer(
solrMetricsContext.longHistogram(
"http_client_request_duration",
"HTTP client request duration",
OtelUnit.MILLISECONDS),
attributes);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just found this metric wasn't migrated. Need to prefix the metric with solr_ and what should the name of this be?

Comment on lines 74 to 77
// NOCOMMIT: The nodes view might be broken because of this
@Test
@AwaitsFix(bugUrl = "https://issues.apache.org/jira/browse/SOLR-8207")
public void proxyMetricsHandlerAllNodes() throws IOException, SolrServerException {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The admin UI might be broken based on this test. I need to come back to it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I was looking at this and the UI and it's not broken? I am not good at Solr JS admin ui but I don't think it was using /admin/metrics for this but instead /admin/info/system handler. Turns out I also fixed that node view when I committed 7daafff going back and forth between the commits basically killing two birds with one stone. I am thinking of removing this test if no objections unless I am missing something here I don't know about the UI.

@mlbiscoc mlbiscoc requested a review from dsmiley October 9, 2025 00:31
Copy link
Contributor

@dsmiley dsmiley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

damn that was a lot of work!

cybozulabs-langdetect = { module = "com.cybozu.labs:langdetect", version.ref = "cybozulabs-langdetect" }
decompose-decompose = { module = "com.arkivanov.decompose:decompose", version.ref = "decompose" }
decompose-extensions-compose = { module = "com.arkivanov.decompose:extensions-compose", version.ref = "decompose" }
dropwizard-metrics-core = { module = "io.dropwizard.metrics:metrics-core", version.ref = "dropwizard-metrics" }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still see dropwizard dependencies

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forgot to mention this. I didn't remove it completely because a few things were using the instruments but not necessarily the registry for the metrics collection beside CrossDcConsumer (I don't know much about this module).

  1. Stats was being used for overseer and Zk distributed queue and this was also being used by embedded zk metrics. It worked with the OverseerStatusCmd API to just get some averages and rates from the instruments.
  2. MiniClusterState was using the meter instrument for logging somethings
  3. CrossDcConsumer was creating its own registry it looks like for collecting metrics

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay; looks like another JIRA I guess. At least doesn't seem like a release blocker.

<!-- Consult the javadoc of o.e.j.util.thread.QueuedThreadPool -->
<!-- for all configuration that may be set here. -->
<!-- =========================================================== -->
<Arg name="threadPool">
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm; I wonder if Solr starts -- i.e. do any BATS integration tests work


private void initializeMetrics(SolrMetricsContext parentContext) {
var solrMetricsContext = parentContext.getChildContext(this);
String expandedScope =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now not needed

Boolean.getBoolean("solr.disableRequestId");

private HandlerMetrics metricsShard = HandlerMetrics.NO_OP;
private final Map<String, Counter> shardPurposes = new ConcurrentHashMap<>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

huh.... I know of this thing and it's surprising that we don't actually consume from it. Well good riddance then.

TimeUnit.SECONDS,
(n, c) -> DocCollection.isFullyActive(n, c, 2, 1));

fiveHundredsByNode = new LinkedHashMap<>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assertNo500s uses that

response = handleDropwizardRegistry(params);

consumer.accept("metrics", response);
handlePrometheusRequest(params, consumer);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing I couldn't figure out here is how do we default the response to the PrometheusResponseWriter without wt? Other option I was thinking is we require the wt parameter with either prometheus or openmetrics and throw an exception otherwise. This would keep it verbose so you know if which format you are reading.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's okay to default to one or the other.
Note eventually we should consider the Accepts header but I wouldn't worry about that now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will probably default to prometheus in that case then. My question though was actually how do you implement this to go to the PrometheusResponseWriter without wt parameter since it seems to always default to the JacksonJsonWriter otherwise?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ugh. Looks like you need to set the params on the request to set WT. Try SolrParams.wrapDefaults(orig,SolrParams.of("wt", "prometheus"))

consumer.accept(name, (MapWriter) ew -> ew.putNoEx("count", counter.getCount()));
}
}
public static void addMetrics(NamedList<Object> lst, Timer timer) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You moved this from its former position. Can you put it back unless there's a reason for the move?

* @param consumer consumer for created names and metrics
* @param <T> formal type
*/
public static <T extends PlatformManagedObject> void addMXBeanMetrics(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The javadocs of this method are no longer true. Maybe even the name needs adjustment... perhaps foreachGetterValue?
It's not even metrics related at all; maybe it should move to its only caller in SystemInfoHandler

* @param consumer consumer for created names and metrics
* @param <T> formal type
*/
public static <T extends PlatformManagedObject> void addMXBeanMetrics(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And likewise the name and javadocs are no longer accurate

cybozulabs-langdetect = { module = "com.cybozu.labs:langdetect", version.ref = "cybozulabs-langdetect" }
decompose-decompose = { module = "com.arkivanov.decompose:decompose", version.ref = "decompose" }
decompose-extensions-compose = { module = "com.arkivanov.decompose:extensions-compose", version.ref = "decompose" }
dropwizard-metrics-core = { module = "io.dropwizard.metrics:metrics-core", version.ref = "dropwizard-metrics" }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay; looks like another JIRA I guess. At least doesn't seem like a release blocker.

String format = params.get(CommonParams.WT);

if (format == null) {
req.setParams(SolrParams.wrapDefaults(params, SolrParams.of("wt", "prometheus")));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now defaulting to prometheus

* sends them periodically as update requests to a selected Solr collection and to a configured
* handler.
*/
public class SolrReporter extends ScheduledReporter {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dsmiley Just to respond to the JIRA, here is where I removed the reporter.

Copy link
Contributor

@dsmiley dsmiley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. Will merge tonight.

@mlbiscoc
Copy link
Contributor Author

Actually just realized I broke a few tests because some are trying to hit /admin/metrics endpoint but throw my new exception because it requested in javabin. I didn't look too deep but if you want to merge it I can fix it in a separate PR when I get back.

@mlbiscoc
Copy link
Contributor Author

mlbiscoc commented Oct 13, 2025

Gah I see. All the splitshard API tests is what started failing because of this. Was techinically always failing, but if metrics weren't reached, it just proceeded without fail and logged a warning. Now we throw an exception which does fail it. Other test is this which isn't too bad. Just add wt=prometheus. Once those are fixed we should be good 🤞.

… fork/mlbiscoc/SOLR-17855-dropwizard-to-otel

# NODE_REGISTRY not getRegistryName

# Conflicts:
#	solr/core/src/java/org/apache/solr/handler/RequestHandlerBase.java
#	solr/core/src/java/org/apache/solr/update/DirectUpdateHandler2.java
#	solr/core/src/java/org/apache/solr/update/SolrIndexWriter.java
@dsmiley dsmiley merged commit cee53df into apache:feature/SOLR-17458-rebased Oct 14, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants