Skip to content

Commit bf7648e

Browse files
author
Leszek Zalewski
authored
Merge pull request #12 from lessonnine/TNT-2741/parse-tcp-info
[TNT-2741] Parse TCP INFO binary data
2 parents b2e5852 + e6a77b5 commit bf7648e

File tree

15 files changed

+253
-27
lines changed

15 files changed

+253
-27
lines changed

.github/workflows/build.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ jobs:
77
strategy:
88
matrix:
99
os: ['ubuntu-18.04', 'ubuntu-20.04']
10-
ruby: ['2.6', '2.7', '3.0']
10+
ruby: ['2.6', '2.7', '3.0', '3.1']
1111

1212
runs-on: ${{ matrix.os }}
1313

CHANGELOG.md

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,20 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
77

88
## [Unreleased]
99

10+
## [1.1.0 Beta]
11+
12+
### Added
13+
14+
Different ways to parse `Socket::Option`. Mainly due to the fact that `#inspect` can't
15+
generate proper data on AWS Fargate, which runs Amazon Linux 2 with 4.14 kernel. So now
16+
besides `#inspect` there's also `#unpack` that parses binary data and picks proper field.
17+
18+
It depends on the kernel, but new fields are usually added at the end of the `tcp_info`
19+
struct, so it should more or less stay stable.
20+
21+
You can configure it by passing in `config.socket_parser = :inspect` or
22+
`config.socket_parser = ->(opt) { your implementation }`.
23+
1024
## [1.1.0 Alpha]
1125

1226
### Added

Gemfile.lock

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
PATH
22
remote: .
33
specs:
4-
puma-plugin-telemetry (1.1.0.alpha)
4+
puma-plugin-telemetry (1.1.0.beta)
55
puma (>= 5.0)
66

77
GEM

README.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -62,6 +62,8 @@ Puma::Plugin::Telemetry.configure do |config|
6262
config.initial_delay = 10
6363
config.frequency = 30
6464
config.puma_telemetry = %w[workers.requests_count queue.backlog queue.capacity]
65+
config.socket_telemetry!
66+
config.socket_parser = :inspect
6567
config.add_target :io, formatter: :json, io: StringIO.new
6668
config.add_target :dogstatsd, client: Datadog::Statsd.new(tags: { env: ENV["RAILS_ENV"] })
6769
end

lib/puma/plugin/telemetry.rb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -52,7 +52,7 @@ def socket_telemetry(telemetry, launcher)
5252
return telemetry if launcher.nil?
5353
return telemetry unless config.socket_telemetry?
5454

55-
telemetry.merge! SocketData.new(launcher.binder.ios)
55+
telemetry.merge! SocketData.new(launcher.binder.ios, config.socket_parser)
5656
.metrics
5757

5858
telemetry

lib/puma/plugin/telemetry/config.rb

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -62,13 +62,28 @@ class Config
6262
# - default: false
6363
attr_accessor :socket_telemetry
6464

65+
# Symbol representing method to parse the `Socket::Option`, or
66+
# the whole implementation as a lambda. Available options:
67+
# - `:inspect`, based on the `Socket::Option#inspect` method,
68+
# it's the safest and slowest way to extract the info. `inspect`
69+
# output might not be available, i.e. on AWS Fargate
70+
# - `:unpack`, parse binary data given by `Socket::Option`. Fastest
71+
# way (12x compared to `inspect`) but depends on kernel headers
72+
# and fields ordering within the struct. It should almost always
73+
# match though. DEFAULT
74+
# - proc/lambda, `Socket::Option` will be given as an argument, it
75+
# should return the value of `unacked` field as an integer.
76+
#
77+
attr_accessor :socket_parser
78+
6579
def initialize
6680
@enabled = false
6781
@initial_delay = 5
6882
@frequency = 5
6983
@targets = []
7084
@puma_telemetry = DEFAULT_PUMA_TELEMETRY
7185
@socket_telemetry = false
86+
@socket_parser = :unpack
7287
end
7388

7489
def enabled?

lib/puma/plugin/telemetry/data.rb

Lines changed: 118 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -100,13 +100,34 @@ def sum_stat(stat)
100100
class SocketData
101101
UNACKED_REGEXP = /\ unacked=(?<unacked>\d+)\ /.freeze
102102

103-
def initialize(ios)
104-
@sockets = ios.select { |io| io.respond_to?(:getsockopt) }
103+
def initialize(ios, parser)
104+
@sockets = ios.select { |io| io.respond_to?(:getsockopt) && io.is_a?(TCPSocket) }
105+
@parser =
106+
case parser
107+
when :inspect then method(:parse_with_inspect)
108+
when :unpack then method(:parse_with_unpack)
109+
when Proc then parser
110+
end
105111
end
106112

107113
# Number of unacknowledged connections in the sockets, which
108114
# we know as socket backlog.
109115
#
116+
def unacked
117+
@sockets.sum do |socket|
118+
@parser.call(socket.getsockopt(Socket::SOL_TCP,
119+
Socket::TCP_INFO))
120+
end
121+
end
122+
123+
def metrics
124+
{
125+
"sockets.backlog" => unacked
126+
}
127+
end
128+
129+
private
130+
110131
# The Socket::Option returned by `getsockopt` doesn't provide
111132
# any kind of accessors for data inside. It decodes it on demand
112133
# for `inspect` as strings in C implementation. It looks like
@@ -143,21 +164,104 @@ def initialize(ios)
143164
# total_retrans=0
144165
# (128 bytes too long)>
145166
#
146-
# That's why we have to pull the `unacked` field by parsing
147-
# `inspect` output, instead of using something like `opt.unacked`
148-
def unacked
149-
@sockets.sum do |socket|
150-
tcp_info = socket.getsockopt(Socket::SOL_TCP, Socket::TCP_INFO).inspect
151-
tcp_match = tcp_info.match(UNACKED_REGEXP)
167+
# That's why pulling the `unacked` field by parsing
168+
# `inspect` output is one of the ways to retrieve it.
169+
#
170+
def parse_with_inspect(tcp_info)
171+
tcp_match = tcp_info.inspect.match(UNACKED_REGEXP)
152172

153-
tcp_match[:unacked].to_i
154-
end
173+
return 0 if tcp_match.nil?
174+
175+
tcp_match[:unacked].to_i
155176
end
156177

157-
def metrics
158-
{
159-
"sockets.backlog" => unacked
160-
}
178+
# The above inspect data might not be available everywhere (looking at you
179+
# AWS Fargate Host running on kernel 4.14!), but we might still recover it
180+
# by manually unpacking the binary data based on linux headers. For example
181+
# below is tcp info struct from `linux/tcp.h` header file, from problematic
182+
# host rocking kernel 4.14.
183+
#
184+
# struct tcp_info {
185+
# __u8 tcpi_state;
186+
# __u8 tcpi_ca_state;
187+
# __u8 tcpi_retransmits;
188+
# __u8 tcpi_probes;
189+
# __u8 tcpi_backoff;
190+
# __u8 tcpi_options;
191+
# __u8 tcpi_snd_wscale : 4, tcpi_rcv_wscale : 4;
192+
# __u8 tcpi_delivery_rate_app_limited:1;
193+
#
194+
# __u32 tcpi_rto;
195+
# __u32 tcpi_ato;
196+
# __u32 tcpi_snd_mss;
197+
# __u32 tcpi_rcv_mss;
198+
#
199+
# __u32 tcpi_unacked;
200+
# __u32 tcpi_sacked;
201+
# __u32 tcpi_lost;
202+
# __u32 tcpi_retrans;
203+
# __u32 tcpi_fackets;
204+
#
205+
# /* Times. */
206+
# __u32 tcpi_last_data_sent;
207+
# __u32 tcpi_last_ack_sent; /* Not remembered, sorry. */
208+
# __u32 tcpi_last_data_recv;
209+
# __u32 tcpi_last_ack_recv;
210+
#
211+
# /* Metrics. */
212+
# __u32 tcpi_pmtu;
213+
# __u32 tcpi_rcv_ssthresh;
214+
# __u32 tcpi_rtt;
215+
# __u32 tcpi_rttvar;
216+
# __u32 tcpi_snd_ssthresh;
217+
# __u32 tcpi_snd_cwnd;
218+
# __u32 tcpi_advmss;
219+
# __u32 tcpi_reordering;
220+
#
221+
# __u32 tcpi_rcv_rtt;
222+
# __u32 tcpi_rcv_space;
223+
#
224+
# __u32 tcpi_total_retrans;
225+
#
226+
# __u64 tcpi_pacing_rate;
227+
# __u64 tcpi_max_pacing_rate;
228+
# __u64 tcpi_bytes_acked; /* RFC4898 tcpEStatsAppHCThruOctetsAcked */
229+
# __u64 tcpi_bytes_received; /* RFC4898 tcpEStatsAppHCThruOctetsReceived */
230+
# __u32 tcpi_segs_out; /* RFC4898 tcpEStatsPerfSegsOut */
231+
# __u32 tcpi_segs_in; /* RFC4898 tcpEStatsPerfSegsIn */
232+
#
233+
# __u32 tcpi_notsent_bytes;
234+
# __u32 tcpi_min_rtt;
235+
# __u32 tcpi_data_segs_in; /* RFC4898 tcpEStatsDataSegsIn */
236+
# __u32 tcpi_data_segs_out; /* RFC4898 tcpEStatsDataSegsOut */
237+
#
238+
# __u64 tcpi_delivery_rate;
239+
#
240+
# __u64 tcpi_busy_time; /* Time (usec) busy sending data */
241+
# __u64 tcpi_rwnd_limited; /* Time (usec) limited by receive window */
242+
# __u64 tcpi_sndbuf_limited; /* Time (usec) limited by send buffer */
243+
# };
244+
#
245+
# Now nowing types and order of fields we can easily parse binary data
246+
# by using
247+
# - `C` flag for `__u8` type - 8-bit unsigned (unsigned char)
248+
# - `L` flag for `__u32` type - 32-bit unsigned, native endian (uint32_t)
249+
# - `Q` flag for `__u64` type - 64-bit unsigned, native endian (uint64_t)
250+
#
251+
# Complete `unpack` would look like `C8 L24 Q4 L6 Q4`, but we are only
252+
# interested in `unacked` field at the moment, that's why we only parse
253+
# till this field by unpacking with `C8 L5`.
254+
#
255+
# If you find that it's not giving correct results, then please fall back
256+
# to inspect, or update this code to accept unpack sequence. But in the
257+
# end unpack is preferable, as it's 12x faster than inspect.
258+
#
259+
# Tested against:
260+
# - Amazon Linux 2 with kernel 4.14 & 5.10
261+
# - Ubuntu 20.04 with kernel 5.13
262+
#
263+
def parse_with_unpack(tcp_info)
264+
tcp_info.unpack("C8L5").last
161265
end
162266
end
163267
end

lib/puma/plugin/telemetry/version.rb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
module Puma
44
class Plugin
55
module Telemetry
6-
VERSION = "1.1.0.alpha"
6+
VERSION = "1.1.0.beta"
77
end
88
end
99
end

spec/fixtures/config.rb

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,9 @@
44
lowlevel_error_handler { |_err| [500, {}, ["error page"]] }
55

66
threads 1, 1
7+
8+
bind "unix://#{ENV["BIND_PATH"]}"
9+
710
plugin "telemetry"
811

912
Target = Struct.new(:name) do

spec/fixtures/default.rb

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,9 @@
44
lowlevel_error_handler { |_err| [500, {}, ["error page"]] }
55

66
threads 1, 1
7+
8+
bind "unix://#{ENV["BIND_PATH"]}"
9+
710
plugin "telemetry"
811

912
Puma::Plugin::Telemetry.configure do |config|

0 commit comments

Comments
 (0)