Optimize by reducing heap allocations by jgaskins · Pull Request #25 · comandeo/crystal-memcached

jgaskins · 2023-08-31T16:24:44Z

This PR optimizes the number of heap allocations performed in each request. For example, Array(UInt8).new(0) allocates an array on the heap whereas Bytes.empty involves no heap allocations — the Slice is allocated on the stack with no buffer on the heap because it isn't necessary to allocate an empty buffer.

I wrote up a benchmark to show how much of a difference this made. It's difficult to benchmark because each query spends only a small fraction of its time actively using the CPU in comparison to the amount of time it spends waiting on I/O. At scale, though, with several requests all hitting the cache at once, this can matter a lot so I thought I'd try to optimize where I could.

Benchmark code

require "benchmark"

require "../src/memcached"

mc = Memcached::Client.new

puts
puts "Getting a million values"
keys = Array.new(1_000_000, &.to_s)
pp Benchmark.measure { mc.get_multi keys }
pp Benchmark.memory { mc.get_multi keys }

Benchmark.ips do |x|
  x.report "get hit" { mc.get "hit" }
  x.report "get miss" { mc.get "miss" }
  x.report "set" { mc.set "hit", "value" }
  x.report "increment" { mc.increment "counter", 1, 0 }
end

Raw output

Before

Getting a million values
#<Benchmark::BM::Tms:0x10456d200
 @cstime=0.0,
 @cutime=0.0,
 @label="",
 @real=0.183867041,
 @stime=0.015734,
 @utime=0.167988>
242849552
  get hit  54.47k ( 18.36µs) (± 2.84%)  352B/op   1.01× slower
 get miss  55.14k ( 18.14µs) (± 2.37%)  320B/op        fastest
      set  53.84k ( 18.57µs) (± 2.39%)  336B/op   1.02× slower
increment  53.94k ( 18.54µs) (± 4.50%)  352B/op   1.02× slower

After

Getting a million values
#<Benchmark::BM::Tms:0x102af1280
 @cstime=0.0,
 @cutime=0.0,
 @label="",
 @real=0.1175465,
 @stime=0.011751000000000001,
 @utime=0.076792>
98847824
  get hit  54.35k ( 18.40µs) (± 2.21%)  192B/op        fastest
 get miss  54.12k ( 18.48µs) (± 2.89%)  160B/op   1.00× slower
      set  54.23k ( 18.44µs) (± 2.37%)  160B/op   1.00× slower
increment  52.97k ( 18.88µs) (± 2.64%)  160B/op   1.03× slower

The summary is that we save 50-55% on heap allocations per query — 50% per get miss (320 bytes->160) and 55% per increment (352 bytes->160) with set directly in the middle of the two (336 bytes->160). This results in saving 52% CPU time (183ms->88ms) and 60% reduction in the amount of memory allocated (232MB->94MB) when fetching 1M keys at once with get_multi.

Another (micro-)optimization used here is to replace the hash of opcodes with a NamedTuple. Indexing into a NamedTuple happens at compile time so there is no runtime performance cost at all. I considered using an enum, which offers the same benefit but doesn't require digging into a data structure at all (we would be able to pass :get instead of OPCODES[:get]), but maybe that could come in a followup PR.

Also, in the Crystal ecosystem, it seems to be a common convention to log routine queries and messages at the debug level and let app operators opt into seeing those logs by setting the LOG_LEVEL env var.

In hindsight, that could've (and probably should've) been a separate PR, but I did it in order to run the benchmark, too. So it's not entirely unrelated.

This commit optimizes the number of heap allocations performed in each request. For example, `Array(UInt8).new(0)` allocates an array on the heap whereas `Bytes.empty` involves no heap allocations — the `Slice` is allocated on the stack with no buffer on the heap because it isn't necessary to allocate an empty buffer. Another (micro-)optimization used here is to replace the hash of opcodes with a NamedTuple. Indexing into a NamedTuple happens at compile time so there is no runtime performance cost at all.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize by reducing heap allocations#25

Optimize by reducing heap allocations#25
jgaskins wants to merge 1 commit intocomandeo:masterfrom
jgaskins:optimize-heap-allocations

jgaskins commented Aug 31, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jgaskins commented Aug 31, 2023

Before

After

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant