Skip to content

CP-310090 Stunnel lib: Expose unix socket path for TLS proxy#6886

Open
changlei-li wants to merge 4 commits intoxapi-project:feature/trusted-certsfrom
changlei-li:private/changleli/check-cert
Open

CP-310090 Stunnel lib: Expose unix socket path for TLS proxy#6886
changlei-li wants to merge 4 commits intoxapi-project:feature/trusted-certsfrom
changlei-li:private/changleli/check-cert

Conversation

@changlei-li
Copy link
Contributor

Add a module UnixSocketProxy in stunnel lib to provide a unix socket
path that can proxy TLS. This can offer a unified mechanism for
differnt users.
Stunnel listens on the unix socket path, accepts the connection
from local request then forwards to remote host and port with TLS.
The certificate checking in TLS connection can be done by stunnel
with the new trusted-certs implementation.
Two set of APIs are provided:

  1. long-running stunnel proxy for that the user want to use it
    multi-times and handle the proxy lifecycle itself.
let stunnel_proxy =
  Stunnel.UnixSocketProxy.start ~verify_cert ~remote_host ~remote_port ()
in
match stunnel_proxy with
| Error e -> (* handle error *)
| Ok proxy_handle ->
    let socket_path = Stunnel.UnixSocketProxy.socket_path proxy_handle in
    (* use socket_path with HTTP clients *)
    ...
    Stunnel.UnixSocketProxy.diagnose proxy_handle |> function
    | Ok () -> (* all good *)
    | Error err -> (* handle connection errors *)
    ...
    Stunnel.UnixSocketProxy.stop proxy_handle (* clean up when done *)
  1. short-lived stunnel proxy for that the user just want to use
    one-shot with auto cleanup.
Stunnel.UnixSocketProxy.with_proxy ~verify_cert ~remote_host ~remote_port
  (fun proxy_handle ->
    let socket_path = Stunnel.UnixSocketProxy.socket_path proxy_handle in
    (* use socket_path with HTTP clients *)
    ...
    Stunnel.UnixSocketProxy.diagnose proxy_handle)
    ...
  )

Add a module UnixSocketProxy in stunnel lib to provide a unix socket
path that can proxy TLS. This can offer a unified mechanism for
differnt users.
Stunnel listens on the unix socket path, accepts the connection
from local request then forwards to remote host and port with TLS.
The certificate checking in TLS connection can be done by stunnel
with the new trusted-certs implementation.
Two set of APIs are provided:
1. long-running stunnel proxy for that the user want to use it
   multi-times and handle the proxy lifecycle itself.
```OCaml
let stunnel_proxy =
  Stunnel.UnixSocketProxy.start ~verify_cert ~remote_host ~remote_port ()
in
match stunnel_proxy with
| Error e -> (* handle error *)
| Ok proxy_handle ->
    let socket_path = Stunnel.UnixSocketProxy.socket_path proxy_handle in
    (* use socket_path with HTTP clients *)
    ...
    Stunnel.UnixSocketProxy.diagnose proxy_handle |> function
    | Ok () -> (* all good *)
    | Error err -> (* handle connection errors *)
    ...
    Stunnel.UnixSocketProxy.stop proxy_handle (* clean up when done *)
```
2. short-lived stunnel proxy for that the user just want to use
   one-shot with auto cleanup.
```OCaml
Stunnel.UnixSocketProxy.with_proxy ~verify_cert ~remote_host ~remote_port
  (fun proxy_handle ->
    let socket_path = Stunnel.UnixSocketProxy.socket_path proxy_handle in
    (* use socket_path with HTTP clients *)
    ...
    Stunnel.UnixSocketProxy.diagnose proxy_handle)
    ...
  )
```

Signed-off-by: Changlei Li <changlei.li@cloud.com>
Currently, the verify_error relies on "certificate verify failed"
and "No certificate or private key specified" in the stunnel log
file.
In fact, "No certificate or private key specified" is a normal
log for stunnel_proxy. It happens on stunnel configuration
fail with verbose log enabled. We can remove it and it is covered
by "Configuration failed".
For "certificate verify failed", it is a indicator for certificate
verify fail, but the detail reasons is in previous lines like
"CERT: Pre-verification error: unable to get local issuer certificate"
"CERT: Subject checks failed". So the "CERT: " line is collected,
if "certificate verify failed" is found, the details can be raised
out as reason.

Signed-off-by: Changlei Li <changlei.li@cloud.com>
In long time running proxy, every time to call diagnose
need to read entire the stunnel log. It is inficient.
Record the last checked position so we can only check
the new log.

Signed-off-by: Changlei Li <changlei.li@cloud.com>
Signed-off-by: Changlei Li <changlei.li@cloud.com>
(** Stop a running stunnel proxy and clean up resources.
This kills the stunnel process and removes the socket and log files. *)

val diagnose : t -> (unit, Stunnel_error.t) result
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this an expensive operation that exists mostly for debugging? It seem unusual what we rely on a log file. If this operation should be used sparingly, it would be good to mention thus,

Copy link
Contributor Author

@changlei-li changlei-li Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the shortcoming to use stunnel. Although stunnel is a reliable tool to proxy TLS, it's hard to get the error like the native ssl lib. There is no programmatic API or formatted error code in stunnel. While replacing stunnel in our repo is really a big project. So it is the only way for us to get certificate checking error via stunnel log.
I'm clear about the fragility, so I create the new file stunnel_log_scanner to handle this, with real-world stunnel log in unit test.
I don't think the diagnose is a expensive operation, comparing to the network event. When reading log, the input channel uses buffered I/O - doesn't make a system call for each line. It also uses lseek to jump to a specific position (avoids re-reading from beginning).


(** Stream through lines from a specific position, applying function to each.
Returns new_position. Stops early on Error. *)
let stream_from_position (filepath : string) (start_pos : int)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logfiles get rotated, renamed, and compressed. How does this interact with this module?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't worry about it. The log to check is not the secure.log in /var/log. It's in forkexecd data dir(stunnel process is created by forkexecd) when stunnel is running. So no rotate, rename and compress.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I'm fine with this.

proxy_pid: pid
; proxy_socket_path: string
; proxy_logfile: string
; mutable last_checked_position: int
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The in_channel remembers the position already. Could the ic be used here directly, instead of using a integer for the position and re-open the log file at each time of checking the log?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. Let me try.

let _ = Unix.lseek fd start_pos Unix.SEEK_SET in
let ic = Unix.in_channel_of_descr fd in
let rec loop () =
match input_line ic with
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's worth mentioning that the input_line may get a partial line when the stunnel pauses writing the remaining part of the line; and hence the checker may miss the signature forever.

Copy link
Contributor Author

@changlei-li changlei-li Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I have considered this issue. On most normal cases, input_line read untill the new line. In rare case, it reads to End_of_file, but the line is patially flush to the file. Generally the rare case should be caused by stunnel crash which can't be handled.
From the other hand, at the moment the user call diagnose, if the log file ends at "certificate veri" with no new line, the certificate error is actually ignored. But in the user scenario, the diagnose only be called after their network event fails once and then return. Even we read from the start of the log file, it doesn't help the case.
So I think we needn't consider the rare case.

max_retries start_pos =
let rec check ~max_retries cnt start_pos =
match stream_from_position logfile start_pos check_line with
| End new_pos when cnt <= max_retries ->
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example, when max_retries is 2, it checks 4 times.

check 0
-> 0 <= 2 -> check  1
             -> 1 <= 2 -> check 2
                          -> 2 <= 2 -> check 3
                                       -> 3 <= 2 -> stop

Comment on lines +17 to +22
type log_line_status = Continue | LineFound | LineError of Stunnel_error.t

type log_scan_result =
| End of int
| ScanError of Stunnel_error.t * int
| ScanFound of int
Copy link
Member

@minglumlu minglumlu Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the matching's point of view, there would be only two results: Found (Some ...) or Not_found (None).
And for the Found, the caller can determine it's an expected good result or an error. Something like:

type scan_result = (string, string) Result.t  option

let find ~sigs ~box line =
  sigs
  |> List.find_map (
    (fun affix ->
      if Astring.String.is_infix ~affix line then
        Some (box line)
      else
        None
    )

let check_good good = find ~sigs:good ~box:Result.ok

let check_bad bad = find ~sigs:bad ~box:Result.error

let check_both ~good ~bad line = (check_good good) >>= (check_bad bad)

let check_log ~ic ~line_checker ~no_match=
  let rec loop () =
    match input_line ic with
    | line -> (
        match line_checker line with
        | None ->
            loop ()
        | Some r ->
            r
    )
    | exception End_of_file ->
        no_match
  in
  loop ()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants