Skip to content

Metrics vertx_pool_queue_pending doesn't decrease after connection loss #1494

@anvx

Description

@anvx

Version

The latest released. At the moment of writing this - 4.5.13

Context

We have encountered a metrics issue while using the non-blocking Postgres DB driver with Vertx.
The reason why two issues are mentioned in one ticket is that I believe they are tightly coupled and fixing one potentially fixes the second.

Issue 1: vertx_pool_queue_pending doesn't decrease after connection loss
When we have pending queries (vertx_pool_queue_pending{pool_type="sql",}) and the database connection is lost (due to a DB restart, network glitch, etc.), the vertx_pool_queue_pending metric remains stuck. It never goes below the value recorded at the time of connection loss.

This means that in the metrics graph, it appears as if there are always pending queries waiting for a connection—even when the database connection is restored immediately. The only way to resolve this issue is to restart the service.

I've reviewed VertxPoolMetrics and related classes, but it's unclear where the issue lies. Notably, any queries that were pending when the connection was lost are never executed after reconnection.

Issue 2: vertx_pool_queue_pending freezes with high load
We also observed that when sending a high volume of requests, the vertx_pool_queue_pending metric does not decrease correctly.

Do you have a reproducer?

import io.vertx.core.Vertx;
import io.vertx.core.VertxOptions;
import io.vertx.core.json.JsonObject;
import io.vertx.junit5.VertxExtension;
import io.vertx.junit5.VertxTestContext;
import io.vertx.micrometer.MetricsService;
import io.vertx.micrometer.MicrometerMetricsOptions;
import io.vertx.micrometer.VertxPrometheusOptions;
import io.vertx.pgclient.PgBuilder;
import io.vertx.pgclient.PgConnectOptions;
import io.vertx.sqlclient.Pool;
import io.vertx.sqlclient.PoolOptions;
import org.junit.jupiter.api.AfterAll;
import org.junit.jupiter.api.BeforeAll;
import org.junit.jupiter.api.Test;
import org.junit.jupiter.api.extension.ExtendWith;
import org.testcontainers.containers.PostgreSQLContainer;

@ExtendWith(VertxExtension.class)
public class PostgresTest {

  private static PostgreSQLContainer<?> postgresContainer;
  private static Vertx vertx;
  private static Pool pgPool;

  @BeforeAll
  static void setup() {
    postgresContainer = new PostgreSQLContainer<>("postgres")
      .withDatabaseName("testdb")
      .withUsername("user")
      .withPassword("password");
    postgresContainer.start();

    MicrometerMetricsOptions metricsOptions = new MicrometerMetricsOptions()
      .setPrometheusOptions(new VertxPrometheusOptions()
        .setEnabled(true)
        .setStartEmbeddedServer(true)
        .setEmbeddedServerOptions(new io.vertx.core.http.HttpServerOptions().setPort(8081))
        .setPublishQuantiles(true))
      .setEnabled(true);

    vertx = Vertx.vertx(new VertxOptions().setMetricsOptions(metricsOptions));

    PgConnectOptions connectOptions = new PgConnectOptions()
      .setPort(postgresContainer.getFirstMappedPort())
      .setHost(postgresContainer.getHost())
      .setDatabase(postgresContainer.getDatabaseName())
      .setUser(postgresContainer.getUsername())
      .setPassword(postgresContainer.getPassword());

    PoolOptions poolOptions = new PoolOptions().setMaxSize(5);

    pgPool = PgBuilder.pool()
      .with(poolOptions)
      .connectingTo(connectOptions)
      .using(vertx)
      .build();
  }


  @Test
  void testDatabaseConnection(VertxTestContext testContext) throws InterruptedException {
    for (int i = 0; i < 300_000; i++) {
      pgPool.withTransaction(sqlConnection ->
              sqlConnection.query("SELECT PG_SLEEP(5)").execute()
      );
    }

    MetricsService metricsService = MetricsService.create(vertx);
    for (int i = 0; i < 1_000_000; i++) {
      Thread.sleep(1000);
      JsonObject metricsSnapshot = metricsService.getMetricsSnapshot();
      System.out.println("vertx.pool.in.use" + metricsSnapshot.getString("vertx.pool.in.use"));
      System.out.println("vertx.pool.queue.pending" + metricsSnapshot.getString("vertx.pool.queue.pending"));
      System.out.println("=======");
    }
  }

  @AfterAll
  static void tearDown() {
    if (pgPool != null) {
      pgPool.close();
    }
    if (vertx != null) {
      vertx.close();
    }
    if (postgresContainer != null) {
      postgresContainer.stop();
    }
  }

}

Steps to reproduce

Please run the test above and take a look at the logs

Observed Behavior

  • We create 300,000 requests, which immediately fill up vertx.pool.queue.pending (except for the 5 connections actively processing queries).
  • Once all requests are added to the queue, we start printing metrics every second.
  • After about a minute, vertx.pool.in.use drops to 0, meaning no queries are actively being processed.
  • However, vertx.pool.queue.pending freezes at around 299,970 and never decreases.
  • Any new requests increase the pending count from this frozen value, rather than resetting.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions