alts: receive low watermark support #8513

kevinGC · 2025-08-13T23:37:25Z

Reduce CPU usage significantly via SO_RCVLOWAT. There is a small
throughput penalty, so SO_RCVLOWAT is not enabled by default.
Users must turn it on via an option, as not everyone will want the
CPU/throughput tradeoff.

Part of #8510.

For large payloads, we see about a 36% reduction in CPU usage and a 2.5%
reduction in throughput. This is expected, and has been observed in the
C++ gRPC library as well. The expectation is that with TCP receive
zerocopy also enabled, we'll see both a reduction in CPU usage and an
increase in throughput. For users not using zerocopy, they can choose
whether the CPU/throughput tradeoff is worthwhile.

SO_RCVLOWAT is unused for small payloads, where its impact would be
insignificant (but would cost cycles to make syscalls). Enabling it in
grpc-go has no effect on CPU usage or throughput of small payloads, and
so is omitted from the below benchmarks so as not to water down the
impact on both CPU usage and throughput.

Benchmarks are of the ALTS layer alone.

Note that this PR includes #8512. GitHub doesn't support proper commit chains / stacked PRs, so I'm doing this in several PRs with some (annoyingly) redundant commits. Let me know if this isn't a good workflow for you and I'll change things up.

Benchmark numbers:

$ benchstat -col "/rcvlowat" -filter "/size:(64_KiB OR 512_KiB OR 1_MiB OR 4_MiB) .unit:(Mbps OR cpu-usec/op)" ~/lowat_numbers.txt
goos: linux
goarch: amd64
pkg: google.golang.org/grpc/credentials/alts/internal/conn
cpu: AMD Ryzen Threadripper PRO 3945WX 12-Cores
                         │   false    │               true               │
                         │    Mbps    │    Mbps     vs base              │
Rcvlowat/size=64_KiB-12    47.44 ± 0%   47.32 ± 0%  -0.24% (p=0.015 n=6)
Rcvlowat/size=512_KiB-12   299.2 ± 0%   293.6 ± 0%  -1.90% (p=0.002 n=6)
Rcvlowat/size=1_MiB-12     482.1 ± 0%   468.1 ± 0%  -2.88% (p=0.002 n=6)
Rcvlowat/size=4_MiB-12     887.4 ± 1%   842.3 ± 0%  -5.08% (p=0.002 n=6)
geomean                    279.1        272.0       -2.54%

                         │    false     │                true                │
                         │ cpu-usec/op  │ cpu-usec/op  vs base               │
Rcvlowat/size=64_KiB-12      992.2 ± 1%    666.1 ± 1%  -32.87% (p=0.002 n=6)
Rcvlowat/size=512_KiB-12    7.431k ± 1%   4.660k ± 0%  -37.30% (p=0.002 n=6)
Rcvlowat/size=1_MiB-12     14.720k ± 1%   9.192k ± 0%  -37.56% (p=0.002 n=6)
Rcvlowat/size=4_MiB-12      59.19k ± 1%   37.50k ± 3%  -36.64% (p=0.002 n=6)
geomean                     8.953k        5.719k       -36.12%

It's generally useful to have new-and-improved Go. One specific useful feature is `b.Loop()`, which makes benchmarking easier.

It's only called in one place, and is effectively a method on conn. Part of grpc#8510.

Increases large write speed by 9.62% per BenchmarkLargeMessage. Detailed benchmarking numbers below. Rather than use different sizes for the maximum read record, write record, and write buffer, just use 1MB for all of them. Using larger records reduces the amount of payload splitting and the number of syscalls made by ALTS. Part of grpc#8510. SO_RCVLOWAT and TCP receive zerocopy are only effective with larger payloads, and so ALTS can't be limiting payload sizes to 4 KiB. SO_RCVLOWAT and zerocopy are on the receive side, but for benchmarking purposes we need ALTS to send large messages. Benchmarks: $ benchstat large_msg_old.txt large_msg.txt goos: linux goarch: amd64 pkg: google.golang.org/grpc/credentials/alts/internal/conn cpu: AMD Ryzen Threadripper PRO 3945WX 12-Cores │ large_msg_old.txt │ large_msg.txt │ │ sec/op │ sec/op vs base │ LargeMessage-12 68.88m ± 1% 62.25m ± 0% -9.62% (p=0.002 n=6)

SO_RCVLOWAT is *not* enabled by default. Users must turn it on via an option, as not everyone will want the CPU/throughput tradeoff. Part of grpc#8510.

The implementation of setRcvlowat is based on the gRCP C++ library implementation. Part of grpc#8510.

Part of grpc#8510. For large payloads, we see about a 36% reduction in CPU usage and a 2.5% reduction in throughput. This is expected, and has been observed in the C++ gRPC library as well. The expectation is that with TCP receive zerocopy also enabled, we'll see both a reduction in CPU usage and an increase in throughput. For users not using zerocopy, they can choose whether the CPU/throughput tradeoff is worthwhile. SO_RCVLOWAT is unused for small payloads, where its impact would be insignificant (but would cost cycles to make syscalls). Enabling it has no effect on CPU usage or throughput of small payloads, and so is omitted from the below benchmarks so as not to water down the impact on both CPU usage and throughput. Benchmark numbers: $ benchstat -col "/rcvlowat" -filter "/size:(64_KiB OR 512_KiB OR 1_MiB OR 4_MiB) .unit:(Mbps OR cpu-usec/op)" ~/lowat_numbers.txt goos: linux goarch: amd64 pkg: google.golang.org/grpc/credentials/alts/internal/conn cpu: AMD Ryzen Threadripper PRO 3945WX 12-Cores │ false │ true │ │ Mbps │ Mbps vs base │ Rcvlowat/size=64_KiB-12 47.44 ± 0% 47.32 ± 0% -0.24% (p=0.015 n=6) Rcvlowat/size=512_KiB-12 299.2 ± 0% 293.6 ± 0% -1.90% (p=0.002 n=6) Rcvlowat/size=1_MiB-12 482.1 ± 0% 468.1 ± 0% -2.88% (p=0.002 n=6) Rcvlowat/size=4_MiB-12 887.4 ± 1% 842.3 ± 0% -5.08% (p=0.002 n=6) geomean 279.1 272.0 -2.54% │ false │ true │ │ cpu-usec/op │ cpu-usec/op vs base │ Rcvlowat/size=64_KiB-12 992.2 ± 1% 666.1 ± 1% -32.87% (p=0.002 n=6) Rcvlowat/size=512_KiB-12 7.431k ± 1% 4.660k ± 0% -37.30% (p=0.002 n=6) Rcvlowat/size=1_MiB-12 14.720k ± 1% 9.192k ± 0% -37.56% (p=0.002 n=6) Rcvlowat/size=4_MiB-12 59.19k ± 1% 37.50k ± 3% -36.64% (p=0.002 n=6) geomean 8.953k 5.719k -36.12%

codecov · 2025-08-13T23:40:47Z

Codecov Report

❌ Patch coverage is 49.45055% with 46 lines in your changes missing coverage. Please review.
✅ Project coverage is 81.76%. Comparing base (55e8b90) to head (af10d77).
⚠️ Report is 16 commits behind head on master.

Files with missing lines	Patch %	Lines
credentials/alts/internal/conn/conn_linux.go	10.81%	33 Missing ⚠️
credentials/alts/internal/conn/record.go	71.11%	11 Missing and 2 partials ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #8513      +/-   ##
==========================================
- Coverage   82.40%   81.76%   -0.64%     
==========================================
  Files         414      414              
  Lines       40531    40575      +44     
==========================================
- Hits        33399    33178     -221     
- Misses       5770     6026     +256     
- Partials     1362     1371       +9

Files with missing lines	Coverage Δ
credentials/alts/alts.go	`75.65% <100.00%> (+0.49%)`	⬆️
credentials/alts/internal/conn/common.go	`100.00% <ø> (ø)`
credentials/alts/internal/handshaker/handshaker.go	`79.29% <100.00%> (+1.74%)`	⬆️
credentials/alts/internal/conn/record.go	`74.47% <71.11%> (-3.73%)`	⬇️
credentials/alts/internal/conn/conn_linux.go	`10.81% <10.81%> (ø)`

... and 40 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

arjan-bal · 2025-08-21T07:55:36Z

I believe it should be possible to set this using a custom dialer, without any code changes. Have you considered that approach?

arjan-bal · 2025-09-03T17:46:07Z

I believe it should be possible to set this using a custom dialer, without any code changes. Have you considered that approach?

I don't think using a custom dialer would work since we need to update the value of socket option before every read. Maybe we should consider directly implementing this in the transport.

kevinGC added 6 commits August 13, 2025 10:19

deps: bump go version to 1.24

0fb892f

It's generally useful to have new-and-improved Go. One specific useful feature is `b.Loop()`, which makes benchmarking easier.

alts: move ParseFramedMsg out of common

dc8a0ca

It's only called in one place, and is effectively a method on conn. Part of grpc#8510.

alts: options and flags for SO_RCVLOWAT

09c22f8

SO_RCVLOWAT is *not* enabled by default. Users must turn it on via an option, as not everyone will want the CPU/throughput tradeoff. Part of grpc#8510.

alts: receive low watermark support

fdd3cf5

The implementation of setRcvlowat is based on the gRCP C++ library implementation. Part of grpc#8510.

arjan-bal assigned kevinGC Aug 21, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

alts: receive low watermark support #8513

alts: receive low watermark support #8513

Uh oh!

kevinGC commented Aug 13, 2025 •

edited

Loading

Uh oh!

codecov bot commented Aug 13, 2025 •

edited

Loading

Uh oh!

arjan-bal commented Aug 21, 2025 •

edited

Loading

Uh oh!

arjan-bal commented Sep 3, 2025

Uh oh!

Uh oh!

alts: receive low watermark support #8513

Are you sure you want to change the base?

alts: receive low watermark support #8513

Uh oh!

Conversation

kevinGC commented Aug 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Aug 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

arjan-bal commented Aug 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

arjan-bal commented Sep 3, 2025

Uh oh!

Uh oh!

kevinGC commented Aug 13, 2025 •

edited

Loading

codecov bot commented Aug 13, 2025 •

edited

Loading

arjan-bal commented Aug 21, 2025 •

edited

Loading