This repository was archived by the owner on Nov 21, 2025. It is now read-only.
Zeek v6.2.0 update #41
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
tl;dr
My recent changes to the reference Zeek NDJSON shaper in brimdata/super#5106 unfortunately broke "perf compare" (see this recent Actions failure). When contemplating the multiple ways to address this, I've concluded that the best way forward is to just bring the test data set current by regenerating it from Zeek v6.2.0.
Details
The root cause of the breakage is a bit complex. The prior shaper was from a time when Zeek's
ssllogs included a field calledclient_cert_chain_fuids, but that field was dropped in more recent Zeek releases. The new shaper still works with the newer Zeek logs if the shaper is configured with_crop_records = falsesince that assigns an inferred type to this field with its unrecognized name. However, the field often appears as an empty array, and that means Zeek assigns the field an inferred type of<[null]>. Since some of the permutations of theperf-compare.shscript output in Zeek TSV, this creates a problem because this is a type that it refuses to output, presumably since there's no obvious Zeek equivalent type to use.There's at least a few ways I could have worked around this, e.g.:
perf-compare.shvector[string], since the majority of arrays in Zeek logs end up being of strings... which I was leaning toward a bit since that reference to&{}in the error message makes it pretty impossible to understand, but I also know our Zeek TSV support has been seen at times as hanging by a thread, and given our other priorities, I respect that)All of these seemed kind of messy and would just prolong the life of a data set that came from software old enough that it's probably being run almost nowhere at this point. The Zed project's connection to Zeek is much smaller than it once was, but we have enough users that still rely on us for security use cases that we've continued to stay current in other ways, such as the recently-created build-zeek repo that brings Brimcap/Zui current with running Zeek v6.2.0. Meanwhile, the Zed storage formats have thankfully held still for a while, so it seems like a fine time to make this jump forward and get current with this test data as well.
I've eyeballed the differences in the old & new data and nothing particularly shocking stood out. The number of log files increased from 26 to 36 thanks to newer parsers and systems added in recent Zeek releases. The counts of some of the Zeek log types we've always had have varied slightly, but not dramatically. Technically this means that the outputs from
perf-compare.shcan't be compared apples-to-apples with the ones from the past, but we use Autoperf now as our solution for catching perf regressions. Really, betweenperf-compare.shand the relatedoutput-check.shwe also run regularly in CI, this repo has mostly been about catching unexpected changes in the output formats and the occasional bug, and I expect it will continue to serve that purpose once this gets merged and becomes a new baseline.Once this merges, I'll need to merge a small Zed PR to get the CI to run clean again. Specifically, the
cut tsin one of the Zed queries inperf-compare.shneeds to becomecut quiet(ts)because I let theloaded_scriptslogs become part of the updated data set and they lack atsfield and hence the rawcut tswould normally produce anerror("missing")that the Zeek TSV writer would refuse to output. We've already been using this technique in thecount() by quiet(id.orig_h)query that's already been part ofperf-compare.sh, so it seems fine to extend the approach. When I put up that PR, I'll use the opportunity to create a new baseline set of perf numbers for the https://github.com/brimdata/zed/blob/main/performance/README.md page.