Skip to content

using eof? as readiness probe causes timeout exception when used with JDK servers #233

@dotconfig404

Description

@dotconfig404

What is happening

When using keep-alive connections on Jetty/JDK 17+ servers requests will hang and time out on certain platforms, for example Rocky 8.

This could actually happen with any kind of server and on any platform and it can easily be reproduced, see the example code at the bottom.

Here is a small code example to illustrate the occurrence a bit better:

http.start
req = Net::HTTP::Get.new(uri)
res = http.request(req)
# Timeout here! May be a Timeout exception on SSL_read internally
req = Net::HTTP::Get.new(uri)
res = http.request(req)
http.finish

Related issue from ruby-lang

The bug in question was actually already described a few years ago here:
https://bugs.ruby-lang.org/issues/19017

Why is it happening

Requests are issued via transport_request which uses begin_transport as a pre-flight check, reestablishing the connection when certain conditions are met. One of the conditions is checking readability via TCPSocket#wait_readable(0) and subsequently calling SSLSocket#eof? if it is readable.

Net::BufferedIO#eof? ends up calling sysread which eventually issues a SSL_read from OpenSSL. SSL_read will try to get application data from the socket.
The problem is that the TCPSocket may signal readability, despite no application data being present.

In JDK 17+ there is a flag that triggers sending an updated NST (New Session Ticket) upon certain conditions, after sending a HTTP response. If the ticket arrives quickly enough then it will be there right after application data, signalling readability on the TCP socket right before begin_transport is called which leads to the hang.

I have traced such an exchange client-side via OpenSSL tracing, this is how it looks llike:

  1. Sent: ClientHello
  2. Received: ServerHello, ChangeCipherSec, EncryptedExtensions, Certificate, CertificateVerify, Finished
  3. Sent: ChangeCipherSpec, Finished
  4. Sent: GET
  5. Received: NewSessionTicket (stateless)
  6. Received: GET
  7. Received: NewSessionTicket (stateful)
    here eof? is called and the cllient hangs
  8. Received: close notify
  9. Sent: close notify
  10. Sent: ClientHello

This is how it plays out for clients on puppetserver, which is based on Jetty. I opened an issue about it a few months ago: OpenVoxProject/openvox-server#25

Possible solutions

eof? is called here in order to check for a EOFError due to the connection being closed unexpectedly or something else. This is not a reliable way to check for this, even if it wouldn't hang (we could turn off SSL_MODE_AUTO_RETRY to make it return when receiving handshake data), because it does not actually read until the end of the stream. A close notify could still be on-line.

In order to properly check for EOFError we could instead call a non-blocking read until EOF or until the TCP socket is no longer readable. See #232

Calling a SSL_read with length 0 would be preferred, in order to be sure not to consume application data without storing it in a useful buffer, but that is not possible atm. It should not be possible for this to occur anyways, as unprompted application data is AFAIK not supported yet in this module (e.g. HTTP/2 or Websockets). So perhaps this is a good enough solution.

Reproducing the issue

How to

Use the following example client in conjunction with the example server, this should work on any system.

When on Rocky 8 you can also use Jetty 12, the following comment details on how to start an example Jetty server for this purpose

Client

require 'net/http'
require 'openssl'
require 'uri'

HOST = 'localhost'
PORT = '8443'
BASE_URL = "https://#{HOST}:#{PORT}/"
uri = URI(BASE_URL)

http = Net::HTTP.new(uri.host, uri.port)
http.use_ssl = true
# accept self-signed
http.verify_mode = OpenSSL::SSL::VERIFY_NONE  
http.set_debug_output($stderr)

http.start
req = Net::HTTP::Get.new(uri)
res = http.request(req)
puts "[INFO] GET #1 => #{res.code}"

req = Net::HTTP::Get.new(uri)
res = http.request(req)
puts "[INFO] GET #2 => #{res.code}"

http.finish

Server

build with gcc -O2 -Wall server.c -o server -lssl -lcrypto

#include <arpa/inet.h>
#include <netinet/in.h>
#include <stdio.h>
#include <string.h>
#include <unistd.h>
#include <stdarg.h>
#include <time.h>
#include <openssl/ssl.h>

#define PORT 8443
#define CERT "./localhost-cert.pem"
#define KEY  "./localhost-key.pem"

static void timestamp_log(const char *fmt, ...) {
  time_t now = time(NULL);
  struct tm tm; localtime_r(&now, &tm);
  char ts[16]; strftime(ts, sizeof(ts), "%H:%M:%S", &tm);
  fprintf(stderr, "[%s] ", ts);
  va_list ap; va_start(ap, fmt);
  vfprintf(stderr, fmt, ap);
  va_end(ap);
}

static int await_request(SSL *ssl) {
  static unsigned char buf[16384];
  int r = SSL_read(ssl, buf, (int)sizeof(buf));
  return r > 0 ? 1 : 0;
}

static void write_response(SSL *ssl) {
  const char *hdr =
    "HTTP/1.1 200 OK\r\n"
    "Content-Length: 0\r\n"
    "Connection: keep-alive\r\n"
    "\r\n";
  SSL_write(ssl, hdr, (int)strlen(hdr));
}

int main(void) {
  SSL_CTX *ctx = SSL_CTX_new(TLS_server_method());
  SSL_CTX_set_min_proto_version(ctx, TLS1_3_VERSION);
  SSL_CTX_set_max_proto_version(ctx, TLS1_3_VERSION);
  SSL_CTX_use_certificate_file(ctx, CERT, SSL_FILETYPE_PEM);
  SSL_CTX_use_PrivateKey_file(ctx, KEY,  SSL_FILETYPE_PEM);

  int listen_fd = socket(AF_INET, SOCK_STREAM, 0);
  int yes = 1;
  setsockopt(listen_fd, SOL_SOCKET, SO_REUSEADDR, &yes, sizeof(yes));

  struct sockaddr_in socket_address = {0};
  socket_address.sin_family = AF_INET;
  socket_address.sin_addr.s_addr = htonl(INADDR_ANY);
  socket_address.sin_port = htons(PORT);
  bind(listen_fd, (struct sockaddr*)&socket_address, sizeof(socket_address));
  listen(listen_fd, 16);

  timestamp_log("server started on https://0.0.0.0:%d\n", PORT);

  for (;;) {
    struct sockaddr_in client_address; socklen_t client_len = sizeof(client_address);
    int client_fd = accept(listen_fd, (struct sockaddr*)&client_address, &client_len);

    SSL *ssl = SSL_new(ctx);
    SSL_set_fd(ssl, client_fd);
    SSL_accept(ssl);
    timestamp_log("connection opened\n");

    while (await_request(ssl)) {
      timestamp_log("request received\n");
      write_response(ssl);
      SSL_new_session_ticket(ssl);
      SSL_do_handshake(ssl);
      timestamp_log("NST sent\n");
    }

    SSL_shutdown(ssl);
    timestamp_log("connection closed\n");
    SSL_free(ssl);
    close(client_fd);
  }

  SSL_CTX_free(ctx);
  return 0;
}

Server Certificate

openssl req -x509 -newkey rsa:2048 -nodes \
  -subj "/CN=localhost" \
  -keyout "localhost-key.pem" \
  -out "localhost-cert.pem" \
  -days 365

EDIT: Rewrote the issue for better readability.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions