Skip to content

Conversation

@jiangzho
Copy link
Contributor

@jiangzho jiangzho commented Aug 19, 2025

What changes were proposed in this pull request?

This PR adds support for Prometheus text-based-format and best practices for metrics naming

Existing format

metrics_jvm_bufferPool_direct_capacity_Number{type="gauges"} 98348
metrics_jvm_bufferPool_direct_capacity_Value{type="gauges"} 98348
metrics_jvm_bufferPool_direct_count_Number{type="gauges"} 41
metrics_jvm_bufferPool_direct_count_Value{type="gauges"} 41
metrics_kubernetes_client_http_response_latency_nanos_Count{type="histograms"} 26910
metrics_kubernetes_client_http_response_latency_nanos_Max{type="histograms"} 232417143
metrics_kubernetes_client_http_response_latency_nanos_Mean{type="histograms"} 1.1410164260725182E7
metrics_kubernetes_client_http_response_latency_nanos_Min{type="histograms"} 2931711
metrics_kubernetes_client_http_response_latency_nanos_50thPercentile{type="histograms"} 7559152.0
metrics_kubernetes_client_http_response_latency_nanos_75thPercentile{type="histograms"} 9440850.0
metrics_kubernetes_client_http_response_latency_nanos_95thPercentile{type="histograms"} 1.2576766E7
metrics_kubernetes_client_http_response_latency_nanos_98thPercentile{type="histograms"} 1.34034482E8
metrics_kubernetes_client_http_response_latency_nanos_99thPercentile{type="histograms"} 1.34034482E8
metrics_kubernetes_client_http_response_latency_nanos_999thPercentile{type="histograms"} 1.34034482E8
metrics_kubernetes_client_http_response_latency_nanos_StdDev{type="histograms"} 2.177784612259799E7
metrics_kubernetes_client_pods_get_Count{type="counters"} 8967
metrics_kubernetes_client_pods_get_MeanRate{type="counters"} 0.02678169644780033
metrics_kubernetes_client_pods_get_OneMinuteRate{type="counters"} 0.049758750361204154
metrics_kubernetes_client_pods_get_FiveMinuteRate{type="counters"} 0.035255140329213855
metrics_kubernetes_client_pods_get_FifteenMinuteRate{type="counters"} 0.02931221844089468

with this patch, operator would be able to export format matching Prometheus 2.0 recommended practice like

# HELP jvm_bufferpool_direct_capacity Gauge metric
# TYPE jvm_bufferpool_direct_capacity gauge
jvm_bufferpool_direct_capacity 256092

# HELP jvm_bufferpool_direct_count Gauge metric
# TYPE jvm_bufferpool_direct_count gauge
jvm_bufferpool_direct_count 44

# HELP kubernetes_client_2xx_total Meter count
# TYPE kubernetes_client_2xx_total counter
kubernetes_client_2xx_total 130

Why are the changes needed?

It's Prometheus 2.0 best practice for using the next format with necessary comments. Also, some common scrapers (like Datadog) rely on these metadata (e.g. # HELP and # TYPE) to parse metrics correctly. They may skip metrics if these are missing.

Ideally we can introduce similar patch to Spark core as well for better Prometheus 2.0 compatibility.

Does this PR introduce any user-facing change?

New functionalities becomes available (for metrics format)

How was this patch tested?

CIs / curl on :19090/prometheus to validate the format

Was this patch authored or co-authored using generative AI tooling?

No

@jiangzho
Copy link
Contributor Author

jiangzho commented Aug 19, 2025

cc @peter-toth for review - thanks !

@jiangzho jiangzho force-pushed the metrics_header branch 4 times, most recently from d673f89 to 935c07e Compare August 22, 2025 01:22
Copy link
Contributor

@peter-toth peter-toth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, nits only.

Copy link
Contributor

@peter-toth peter-toth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fixes @jiangzho, LGTM form my side.

@dongjoon-hyun , would you like to take a look at this PR?

@dongjoon-hyun
Copy link
Member

Sure, @peter-toth . Sorry for being late for the party.


public static final ConfigOption<Boolean> ENABLE_PROMETHEUS_TEXT_BASED_FORMAT =
ConfigOption.<Boolean>builder()
.key("spark.kubernetes.operator.metrics.enablePrometheusTextBasedFormat")
Copy link
Member

@dongjoon-hyun dongjoon-hyun Aug 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For new configuration, we had better rename enableXXX to XXXEnabled for consistency.

- spark.kubernetes.operator.metrics.enablePrometheusTextBasedFormat
+ spark.kubernetes.operator.metrics.PrometheusTextBasedFormatEnabled

Please see the existing code you added, @jiangzho .

.key("spark.kubernetes.operator.terminateOnInformerFailureEnabled")

.key("spark.kubernetes.operator.reconciler.trimStateTransitionHistoryEnabled")

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, it seems we are a bit inconsistent with config namings. Sometimes we use Enabled postfix, like in the above cases, but other times .enabled like in spark.kubernetes.operator.leaderElection.enabled. Apache Spark repo seem to prefer .enabled so shall we stick to it here as well?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, it seems we are a bit inconsistent with config namings. Sometimes we use Enabled postfix, like in the above cases, but other times .enabled like in spark.kubernetes.operator.leaderElection.enabled.

😄 I understand why you saying like that. Since Apache Spark was an open source community, there were many ways in the community contribution. IIUC, there was a proposal from Wenchen (cloud-fan) long time ago to make it more consistent for the new configuration. Although my memory is vague, here is my understanding.

  1. .enabled is recommended only for new configuration namespace itself. For example, spark.sql.ansi.enabled is used to turn on and off for all spark.sql.ansi.*. For a single configuration, we don't use it because X.Y.Z.enabled declares X.Y.Z is a new namespace. For example, spark.kubernetes.operator.leaderElection.enabled controls the whole spark.kubernetes.operator.leaderElection.* configuration. There is no misleading part here.
spark.kubernetes.operator.leaderElection.enabled
spark.kubernetes.operator.leaderElection.leaseName
spark.kubernetes.operator.leaderElection.leaseDurationSeconds
spark.kubernetes.operator.leaderElection.renewDeadlineSeconds
spark.kubernetes.operator.leaderElection.retryPeriodSeconds
  1. .enabled vs .enable. We decided to use .enabled while leaving our existing mistakes (like the following). So, there is no arguable point here. The rule is straight forward.
spark.acls.enable
spark.history.ui.acls.enable
spark.appStateStore.asyncTracking.enable
spark.sql.inMemoryTableScanStatistics.enable
spark.kafka.consumer.cache.jmx.enable

So, the following is wrong because we have a clear rule for .enabled.

Apache Spark repo seem to prefer .enabled so shall we stick to it here as well?

Lastly, I want to clarify that you are confused with .enabled and .enableXXX. We need to compare .enableXXX and .XXXEnabled because they don't introduce any side effects like adding a new namespace. So, these .enableXXX and .XXXEnabled are still controversial. I get it. However, look around this K8s operator repository. Did you find any instance of .enableXXX? Technically, it's none because I didn't allow from the beginning to prevent those kind mess-up in this new repository.

$ git grep spark.kubernetes.operator.enable | wc -l
       0

To sum up, I'm -1 with this wrong configuration names to protect the existing consistency of K8s Operator repository and I expect you folks (@peter-toth and @jiangzho) are aware of this naming rule as the protector of K8s Operator repository while reviewing other community PRs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @dongjoon-hyun for the detailed reason through!

I have updated the conf key and would keep that in mind for future configuration

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @dongjoon-hyun for the details, I'm ok with the config names suggested by you.


public static final ConfigOption<Boolean> ENABLE_SANITIZED_PROMETHEUS_METRICS_NAME =
ConfigOption.<Boolean>builder()
.key("spark.kubernetes.operator.metrics.enableSanitizePrometheusMetricsName")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto.

| spark.kubernetes.operator.metrics.clientMetricsGroupByResponseCodeEnabled | Boolean | true | false | When enabled, additional metrics group by http response code group(1xx, 2xx, 3xx, 4xx, 5xx) received from API server will be added. Users can disable it when their monitoring system can combine lower level kubernetes.client.http.response.<3-digit-response-code> metrics. |
| spark.kubernetes.operator.metrics.port | Integer | 19090 | false | The port used for checking metrics |
| spark.kubernetes.operator.metrics.enablePrometheusTextBasedFormat | Boolean | true | false | Whether or not to enable text-based format for Prometheus 2.0, as recommended by https://prometheus.io/docs/instrumenting/exposition_formats/#text-based-format |
| spark.kubernetes.operator.metrics.enableSanitizePrometheusMetricsName | Boolean | true | false | Whether or not to enable automatic name sanitizing for all metrics based on best-practice guide from Prometheus https://prometheus.io/docs/practices/naming/ |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both configuration names should be revised.

.append("_rate{interval=\"15m\"} ")
.append(meter.getFifteenMinuteRate())
.append("\n\n");
return stringBuilder.toString();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please consider to use string concatenation like line 177 to 186.

FYI, Java 9+ improved string contentenation via JEP 280: Indify String Concatenation (https://openjdk.org/jeps/280).

For the technical details, please see SPARK-52880 , @jiangzho .

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the above mentioned reason, I inevitably need to block accidental merging of this PR.

…ices for metrics naming

### What changes were proposed in this pull request?

This PR adds support for Prometheus text-based-format and best practices for metrics naming

Existing format

```
metrics_jvm_bufferPool_direct_capacity_Number{type="gauges"} 98348
metrics_jvm_bufferPool_direct_capacity_Value{type="gauges"} 98348
metrics_jvm_bufferPool_direct_count_Number{type="gauges"} 41
metrics_jvm_bufferPool_direct_count_Value{type="gauges"} 41
metrics_kubernetes_client_http_response_latency_nanos_Count{type="histograms"} 26910
metrics_kubernetes_client_http_response_latency_nanos_Max{type="histograms"} 232417143
metrics_kubernetes_client_http_response_latency_nanos_Mean{type="histograms"} 1.1410164260725182E7
metrics_kubernetes_client_http_response_latency_nanos_Min{type="histograms"} 2931711
metrics_kubernetes_client_http_response_latency_nanos_50thPercentile{type="histograms"} 7559152.0
metrics_kubernetes_client_http_response_latency_nanos_75thPercentile{type="histograms"} 9440850.0
metrics_kubernetes_client_http_response_latency_nanos_95thPercentile{type="histograms"} 1.2576766E7
metrics_kubernetes_client_http_response_latency_nanos_98thPercentile{type="histograms"} 1.34034482E8
metrics_kubernetes_client_http_response_latency_nanos_99thPercentile{type="histograms"} 1.34034482E8
metrics_kubernetes_client_http_response_latency_nanos_999thPercentile{type="histograms"} 1.34034482E8
metrics_kubernetes_client_http_response_latency_nanos_StdDev{type="histograms"} 2.177784612259799E7
metrics_kubernetes_client_pods_get_Count{type="counters"} 8967
metrics_kubernetes_client_pods_get_MeanRate{type="counters"} 0.02678169644780033
metrics_kubernetes_client_pods_get_OneMinuteRate{type="counters"} 0.049758750361204154
metrics_kubernetes_client_pods_get_FiveMinuteRate{type="counters"} 0.035255140329213855
metrics_kubernetes_client_pods_get_FifteenMinuteRate{type="counters"} 0.02931221844089468
```

with this patch, operator would be able to export format matching Prometheus 2.0 recommended practice like

```
# HELP jvm_bufferpool_direct_capacity Gauge metric
# TYPE jvm_bufferpool_direct_capacity gauge
jvm_bufferpool_direct_capacity 256092

# HELP jvm_bufferpool_direct_count Gauge metric
# TYPE jvm_bufferpool_direct_count gauge
jvm_bufferpool_direct_count 44

# HELP kubernetes_client_2xx_total Meter count
# TYPE kubernetes_client_2xx_total counter
kubernetes_client_2xx_total 130

# HELP kubernetes_client_http_response_latency Histogram metric
# TYPE kubernetes_client_http_response_latency histogram
kubernetes_client_http_response_latency_seconds_bucket{le="0.5"} 0.000104422
kubernetes_client_http_response_latency_seconds_bucket{le="0.75"} 0.000128422
kubernetes_client_http_response_latency_seconds_bucket{le="0.95"} 0.000139544
kubernetes_client_http_response_latency_seconds_bucket{le="0.98"} 0.000169124
kubernetes_client_http_response_latency_seconds_bucket{le="0.99"} 0.066452639
kubernetes_client_http_response_latency_seconds_count 2000
kubernetes_client_http_response_latency_seconds_sum 0.456670434
```

### Why are the changes needed?

It's Prometheus 2.0 best practice for using the next format with necessary comments.
Also, some common scrapers (like Datadog) rely on these metadata (e.g. # HELP and # TYPE) to parse metrics correctly. They may skip metrics if these are missing.

### Does this PR introduce _any_ user-facing change?

New functionalities becomes available (for metrics format)

### How was this patch tested?

CIs / curl on :19090/prometheus to validate the format

### Was this patch authored or co-authored using generative AI tooling?

No
@jiangzho
Copy link
Contributor Author

Thanks @dongjoon-hyun ! Your review is truly appreciated. I've updated the PR according to your review

Also added one commit to fix the previous handling of histogram & timers, making them publishing quantiles instead of the mis-interpretation of le buckets in previous commit

@dongjoon-hyun
Copy link
Member

Let's wait until Peter comes back so that we build a consensus on config name spaces because this is essentially important for project management (at least among us).

@peter-toth
Copy link
Contributor

Let's wait until Peter comes back so that we build a consensus on config name spaces because this is essentially important for project management (at least among us).

Thanks @dongjoon-hyun! Our concluson on names in #298 (comment) is ok with me.

Copy link
Contributor

@peter-toth peter-toth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM from my end, @dongjoon-hyun do you have any other suggestions?

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. Thank you, @jiangzho and @peter-toth .

@peter-toth peter-toth closed this in 5f476c3 Sep 3, 2025
@peter-toth
Copy link
Contributor

Thank you @jiangzho , @dongjoon-hyun.

Merged to main (0.5.0).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants