Skip to content

Conversation

@tigrannajaryan
Copy link
Collaborator

@tigrannajaryan tigrannajaryan commented Sep 12, 2025

Resolves #127

This simplifies oneof codec and improves codec speed at the cost of slight increase
of uncompressed size. The zstd-compressed size is approximately the same (slightly
increased or decresed depending on the dataset).

Benchmarks

Otel schema speed improvement:

                                  │ bench_base.txt │         bench_current.txt         │
                                  │   sec/point    │  sec/point   vs base              │
SerializeNative/STEF/serialize-10      64.36n ± 1%   63.10n ± 2%  -1.96% (p=0.020 n=9)
DeserializeNative/STEF/deser-10        24.06n ± 1%   22.70n ± 1%  -5.65% (p=0.000 n=9)
geomean                                39.35n        37.85n       -3.82%

Otel schema size before the change:

File                           Uncompressed   Zstd Bytes
hipstershop-otelmetrics.stefz        432716        93702
hostandcollector-otelmetrics.stefz      1258652        83009
astronomy-otelmetrics.stefz         7462303      1652082

After the change:

File                           Uncompressed   Zstd Bytes
hipstershop-otelmetrics.stefz        441558        95083
hostandcollector-otelmetrics.stefz      1266627        79835
astronomy-otelmetrics.stefz         7557122      1644464

Other benchmarks show similar patterns for speed and size changes.

@github-actions
Copy link

github-actions bot commented Sep 12, 2025

Benchmark Result

Benchmark diff with base branch
goos: linux
goarch: amd64
pkg: github.com/splunk/stef/benchmarks
cpu: AMD EPYC 7763 64-Core Processor                
                                                 │ bench-main.txt │            bench-new.txt            │
                                                 │     sec/op     │    sec/op     vs base               │
SerializeNative/STEF/serialize-4                     8.168m ±  3%   7.885m ±  7%        ~ (p=0.132 n=6)
SerializeNative/STEFU/serialize-4                    36.76m ±  3%   35.98m ±  2%   -2.13% (p=0.041 n=6)
DeserializeNative/STEF/deser-4                       2.701m ±  3%   2.642m ±  1%   -2.21% (p=0.002 n=6)
DeserializeNative/STEFU/deser-4                      8.436m ±  0%   7.478m ±  0%  -11.36% (p=0.002 n=6)
SerializeFromPdata/STEF/serialize-4                  139.1m ±  1%   137.0m ±  2%        ~ (p=0.180 n=6)
SerializeFromPdata/STEFU/serialize-4                 35.74m ±  2%   35.76m ±  1%        ~ (p=0.937 n=6)
DeserializeToPdata/STEF/deserialize-4                42.73m ±  2%   42.47m ±  2%        ~ (p=0.240 n=6)
DeserializeToPdata/STEFU/deserialize-4               61.31m ±  2%   61.57m ±  2%        ~ (p=0.394 n=6)
STEFReaderRead-4                                     2.816m ±  0%   2.702m ±  1%   -4.06% (p=0.002 n=6)
STEFSerializeMultipart/astronomy-otelmetrics-4        3.165 ± 11%    3.201 ± 12%        ~ (p=0.699 n=6)
STEFDeserializeMultipart/astronomy-otelmetrics-4     83.63m ± 11%   77.20m ±  9%        ~ (p=0.180 n=6)
ReadSTEF-4                                           2.842m ±  1%   2.756m ±  1%   -3.03% (p=0.002 n=6)
ReadSTEFZ-4                                          4.712m ±  1%   4.620m ±  2%   -1.94% (p=0.009 n=6)
ReadSTEFZWriteSTEF-4                                 8.303m ±  1%   8.442m ±  3%   +1.67% (p=0.002 n=6)
geomean                                              22.27m         21.71m         -2.54%

                                                 │ bench-main.txt │            bench-new.txt            │
                                                 │   sec/point    │  sec/point    vs base               │
SerializeNative/STEF/serialize-4                     122.2n ±  3%   117.9n ±  7%        ~ (p=0.104 n=6)
SerializeNative/STEFU/serialize-4                    549.8n ±  3%   538.1n ±  2%   -2.13% (p=0.041 n=6)
DeserializeNative/STEF/deser-4                       40.40n ±  3%   39.51n ±  1%   -2.20% (p=0.002 n=6)
DeserializeNative/STEFU/deser-4                      126.1n ±  0%   111.8n ±  0%  -11.38% (p=0.002 n=6)
SerializeFromPdata/STEF/serialize-4                  2.081µ ±  1%   2.050µ ±  2%        ~ (p=0.180 n=6)
SerializeFromPdata/STEFU/serialize-4                 534.5n ±  2%   534.8n ±  1%        ~ (p=1.000 n=6)
DeserializeToPdata/STEF/deserialize-4                639.0n ±  2%   635.3n ±  2%        ~ (p=0.240 n=6)
DeserializeToPdata/STEFU/deserialize-4               917.0n ±  2%   920.9n ±  2%        ~ (p=0.394 n=6)
STEFReaderRead-4                                     42.12n ±  0%   40.41n ±  1%   -4.06% (p=0.002 n=6)
STEFSerializeMultipart/astronomy-otelmetrics-4       4.023µ ± 11%   4.069µ ± 12%        ~ (p=0.699 n=6)
STEFDeserializeMultipart/astronomy-otelmetrics-4    106.30n ± 11%   98.11n ±  9%        ~ (p=0.180 n=6)
ReadSTEF-4                                           42.54n ±  1%   41.24n ±  1%   -3.04% (p=0.002 n=6)
ReadSTEFZ-4                                          70.52n ±  1%   69.15n ±  2%   -1.94% (p=0.009 n=6)
ReadSTEFZWriteSTEF-4                                 124.3n ±  1%   126.4n ±  3%   +1.65% (p=0.002 n=6)
geomean                                              234.2n         228.3n         -2.54%

                                                 │ bench-main.txt │           bench-new.txt            │
                                                 │      B/op      │     B/op      vs base              │
SerializeNative/STEF/serialize-4                     3.570Mi ± 0%   3.618Mi ± 0%  +1.35% (p=0.002 n=6)
SerializeNative/STEFU/serialize-4                    7.143Mi ± 0%   7.539Mi ± 0%  +5.55% (p=0.002 n=6)
DeserializeNative/STEF/deser-4                       925.4Ki ± 0%   934.2Ki ± 0%  +0.94% (p=0.002 n=6)
DeserializeNative/STEFU/deser-4                      1.394Mi ± 0%   1.470Mi ± 0%  +5.45% (p=0.002 n=6)
SerializeFromPdata/STEF/serialize-4                  75.05Mi ± 0%   75.09Mi ± 0%       ~ (p=0.065 n=6)
SerializeFromPdata/STEFU/serialize-4                 7.143Mi ± 0%   7.539Mi ± 0%  +5.55% (p=0.002 n=6)
DeserializeToPdata/STEF/deserialize-4                29.90Mi ± 0%   29.91Mi ± 0%  +0.03% (p=0.002 n=6)
DeserializeToPdata/STEFU/deserialize-4               36.46Mi ± 0%   36.53Mi ± 0%  +0.21% (p=0.002 n=6)
STEFReaderRead-4                                     926.9Ki ± 0%   935.9Ki ± 0%  +0.97% (p=0.002 n=6)
STEFSerializeMultipart/astronomy-otelmetrics-4       3.371Gi ± 0%   3.372Gi ± 0%       ~ (p=0.818 n=6)
STEFDeserializeMultipart/astronomy-otelmetrics-4     20.80Mi ± 0%   20.40Mi ± 0%  -1.93% (p=0.002 n=6)
ReadSTEF-4                                           926.9Ki ± 0%   935.9Ki ± 0%  +0.97% (p=0.002 n=6)
ReadSTEFZ-4                                          10.20Mi ± 0%   10.27Mi ± 0%  +0.63% (p=0.002 n=6)
ReadSTEFZWriteSTEF-4                                 13.64Mi ± 0%   13.75Mi ± 0%  +0.82% (p=0.002 n=6)
geomean                                              10.31Mi        10.46Mi       +1.45%

                                                 │ bench-main.txt │            bench-new.txt            │
                                                 │   allocs/op    │  allocs/op   vs base                │
SerializeNative/STEF/serialize-4                      2.713k ± 0%   2.725k ± 0%  +0.46% (p=0.002 n=6)
SerializeNative/STEFU/serialize-4                      870.0 ± 0%    880.0 ± 0%  +1.15% (p=0.002 n=6)
DeserializeNative/STEF/deser-4                         465.0 ± 0%    465.0 ± 0%       ~ (p=1.000 n=6) ¹
DeserializeNative/STEFU/deser-4                        469.0 ± 0%    469.0 ± 0%       ~ (p=1.000 n=6) ¹
SerializeFromPdata/STEF/serialize-4                   134.8k ± 0%   134.8k ± 0%  +0.01% (p=0.050 n=6)
SerializeFromPdata/STEFU/serialize-4                   871.0 ± 0%    881.0 ± 0%  +1.15% (p=0.002 n=6)
DeserializeToPdata/STEF/deserialize-4                 622.5k ± 0%   622.5k ± 0%       ~ (p=1.000 n=6) ¹
DeserializeToPdata/STEFU/deserialize-4                811.2k ± 0%   811.2k ± 0%       ~ (p=1.000 n=6) ¹
STEFReaderRead-4                                       465.0 ± 0%    465.0 ± 0%       ~ (p=1.000 n=6) ¹
STEFSerializeMultipart/astronomy-otelmetrics-4        13.21M ± 0%   13.21M ± 0%       ~ (p=0.394 n=6)
STEFDeserializeMultipart/astronomy-otelmetrics-4      2.295k ± 0%   2.293k ± 0%  -0.09% (p=0.002 n=6)
ReadSTEF-4                                             465.0 ± 0%    465.0 ± 0%       ~ (p=1.000 n=6) ¹
ReadSTEFZ-4                                            497.0 ± 0%    499.0 ± 0%  +0.40% (p=0.002 n=6)
ReadSTEFZWriteSTEF-4                                  1.297k ± 0%   1.310k ± 0%  +1.00% (p=0.002 n=6)
geomean                                               6.217k        6.235k       +0.29%
¹ all samples are equal
Benchmark result
benchstat bench-new.txt
goos: linux
goarch: amd64
pkg: github.com/splunk/stef/benchmarks
cpu: AMD EPYC 7763 64-Core Processor                
                                                 │ bench-new.txt │
                                                 │    sec/op     │
SerializeNative/STEF/serialize-4                    7.885m ±  7%
SerializeNative/STEFU/serialize-4                   35.98m ±  2%
DeserializeNative/STEF/deser-4                      2.642m ±  1%
DeserializeNative/STEFU/deser-4                     7.478m ±  0%
SerializeFromPdata/STEF/serialize-4                 137.0m ±  2%
SerializeFromPdata/STEFU/serialize-4                35.76m ±  1%
DeserializeToPdata/STEF/deserialize-4               42.47m ±  2%
DeserializeToPdata/STEFU/deserialize-4              61.57m ±  2%
STEFReaderRead-4                                    2.702m ±  1%
STEFSerializeMultipart/astronomy-otelmetrics-4       3.201 ± 12%
STEFDeserializeMultipart/astronomy-otelmetrics-4    77.20m ±  9%
ReadSTEF-4                                          2.756m ±  1%
ReadSTEFZ-4                                         4.620m ±  2%
ReadSTEFZWriteSTEF-4                                8.442m ±  3%
geomean                                             21.71m

                                                 │ bench-new.txt │
                                                 │   sec/point   │
SerializeNative/STEF/serialize-4                    117.9n ±  7%
SerializeNative/STEFU/serialize-4                   538.1n ±  2%
DeserializeNative/STEF/deser-4                      39.51n ±  1%
DeserializeNative/STEFU/deser-4                     111.8n ±  0%
SerializeFromPdata/STEF/serialize-4                 2.050µ ±  2%
SerializeFromPdata/STEFU/serialize-4                534.8n ±  1%
DeserializeToPdata/STEF/deserialize-4               635.3n ±  2%
DeserializeToPdata/STEFU/deserialize-4              920.9n ±  2%
STEFReaderRead-4                                    40.41n ±  1%
STEFSerializeMultipart/astronomy-otelmetrics-4      4.069µ ± 12%
STEFDeserializeMultipart/astronomy-otelmetrics-4    98.11n ±  9%
ReadSTEF-4                                          41.24n ±  1%
ReadSTEFZ-4                                         69.15n ±  2%
ReadSTEFZWriteSTEF-4                                126.4n ±  3%
geomean                                             228.3n

                                                 │ bench-new.txt │
                                                 │     B/op      │
SerializeNative/STEF/serialize-4                    3.618Mi ± 0%
SerializeNative/STEFU/serialize-4                   7.539Mi ± 0%
DeserializeNative/STEF/deser-4                      934.2Ki ± 0%
DeserializeNative/STEFU/deser-4                     1.470Mi ± 0%
SerializeFromPdata/STEF/serialize-4                 75.09Mi ± 0%
SerializeFromPdata/STEFU/serialize-4                7.539Mi ± 0%
DeserializeToPdata/STEF/deserialize-4               29.91Mi ± 0%
DeserializeToPdata/STEFU/deserialize-4              36.53Mi ± 0%
STEFReaderRead-4                                    935.9Ki ± 0%
STEFSerializeMultipart/astronomy-otelmetrics-4      3.372Gi ± 0%
STEFDeserializeMultipart/astronomy-otelmetrics-4    20.40Mi ± 0%
ReadSTEF-4                                          935.9Ki ± 0%
ReadSTEFZ-4                                         10.27Mi ± 0%
ReadSTEFZWriteSTEF-4                                13.75Mi ± 0%
geomean                                             10.46Mi

                                                 │ bench-new.txt │
                                                 │   allocs/op   │
SerializeNative/STEF/serialize-4                     2.725k ± 0%
SerializeNative/STEFU/serialize-4                     880.0 ± 0%
DeserializeNative/STEF/deser-4                        465.0 ± 0%
DeserializeNative/STEFU/deser-4                       469.0 ± 0%
SerializeFromPdata/STEF/serialize-4                  134.8k ± 0%
SerializeFromPdata/STEFU/serialize-4                  881.0 ± 0%
DeserializeToPdata/STEF/deserialize-4                622.5k ± 0%
DeserializeToPdata/STEFU/deserialize-4               811.2k ± 0%
STEFReaderRead-4                                      465.0 ± 0%
STEFSerializeMultipart/astronomy-otelmetrics-4       13.21M ± 0%
STEFDeserializeMultipart/astronomy-otelmetrics-4     2.293k ± 0%
ReadSTEF-4                                            465.0 ± 0%
ReadSTEFZ-4                                           499.0 ± 0%
ReadSTEFZWriteSTEF-4                                 1.310k ± 0%
geomean                                              6.235k

@tigrannajaryan tigrannajaryan changed the title Use non-delta codec for oneof type DRAFT: Use non-delta codec for oneof type Sep 15, 2025
@tigrannajaryan tigrannajaryan force-pushed the tigran/oneofcodec branch 7 times, most recently from 6d65d69 to 7630244 Compare September 22, 2025 17:27
@tigrannajaryan tigrannajaryan force-pushed the tigran/oneofcodec branch 2 times, most recently from 19f11ae to 823b28d Compare September 22, 2025 22:36
@tigrannajaryan tigrannajaryan changed the title DRAFT: Use non-delta codec for oneof type Use non-delta codec for oneof type Sep 22, 2025
@tigrannajaryan tigrannajaryan marked this pull request as ready for review September 22, 2025 22:45
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR changes the oneof codec to use a non-delta encoding approach, replacing the previous delta-based type encoding with direct type encoding using fixed bit counts. This simplifies the codec implementation and improves performance at the cost of a slight increase in uncompressed size.

  • Replaces delta encoding with fixed-width bit encoding for oneof types
  • Updates specification to reflect the codec change from delta to direct encoding
  • Modifies benchmark scripts to reference the correct branch for comparisons

Reviewed Changes

Copilot reviewed 17 out of 17 changed files in this pull request and generated 10 comments.

Show a summary per file
File Description
stefgen/templates/java/oneofEncoder.java.tmpl Template for Java oneof encoders - removes delta encoding and adds fixed bit count encoding
stefgen/templates/java/oneofDecoder.java.tmpl Template for Java oneof decoders - removes delta decoding and adds fixed bit count decoding
stefgen/templates/go/oneof.go.tmpl Template for Go oneof encoder/decoder - removes delta encoding/decoding and adds fixed bit count approach
stef-spec/specification.md Updates specification to document the change from delta to direct type encoding
Multiple generated Java files Generated code implementing the new non-delta codec for various oneof types
Multiple generated Go files Generated code implementing the new non-delta codec for various oneof types
benchmarks/scripts/genoldtestfiles.sh Updates base branch reference for benchmark comparisons
.github/workflows/benchmark.yml Adds test file generation step to benchmark workflow

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Resolves #127

This simplifies oneof codec and improves codec speed at the cost of slight increase
of uncompressed size. The zstd-compressed size is approximately the same (slightly
increased or decresed depending on the dataset).

### Benchmarks

Otel schema speed improvement:
```
                                  │ bench_base.txt │         bench_current.txt         │
                                  │   sec/point    │  sec/point   vs base              │
SerializeNative/STEF/serialize-10      64.36n ± 1%   63.10n ± 2%  -1.96% (p=0.020 n=9)
DeserializeNative/STEF/deser-10        24.06n ± 1%   22.70n ± 1%  -5.65% (p=0.000 n=9)
geomean                                39.35n        37.85n       -3.82%
```

Otel schema size before the change:
```
File                           Uncompressed   Zstd Bytes
hipstershop-otelmetrics.stefz        432716        93702
hostandcollector-otelmetrics.stefz      1258652        83009
astronomy-otelmetrics.stefz         7462303      1652082
```

After the change:
```
File                           Uncompressed   Zstd Bytes
hipstershop-otelmetrics.stefz        441558        95083
hostandcollector-otelmetrics.stefz      1266627        79835
astronomy-otelmetrics.stefz         7557122      1644464
```
@tigrannajaryan tigrannajaryan changed the title Use non-delta codec for oneof type [Breaking] Use non-delta codec for oneof type Sep 23, 2025
@tigrannajaryan tigrannajaryan merged commit 18ca995 into main Sep 23, 2025
9 checks passed
@tigrannajaryan tigrannajaryan deleted the tigran/oneofcodec branch September 23, 2025 13:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet