-
Notifications
You must be signed in to change notification settings - Fork 1
Fastxpp benchmark experiments #21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| # track length before and after | ||
| var before = len(self.seq) | ||
| var _want = want | ||
| var _total = self.reader.read_bytes(self.seq, _want) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if here we did var _total = self.reader.read_bytes(self.seq, _want, keep=True)?
You'd have to adjust the byte math to subtract one. But it avoids an extra read call, which might be nice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's not read_until! So just _want + 1?
Stabilizing fastxpp Benchmarks
I had AI summarize my messy notes and hyperfine results. Everything seems to be correct.
This is follow up to #14 where there were some inconsistent results.
TL,DR
By holding benchmarking scaffolding static with
@no_inlineand selectively forcing inlining on the hottest helpers, we:origimplementation in the apples‑to‑apples, separate‑executable benchmark, all without algorithmic changesMotivation
The existing benchmark numbers have been noisy, likely because the compiler optimizes the benchmark harness together with the implementation under test. This obscures the real cost of each I O strategy. We want numbers that:
strip_newline,read_byte, andread_until, andHeader field definition (for now)
How do we calculate the last line (if we wanted to)?
Different read methods for fastxpp
The methods are named terribly sorry, ill change latter.
There are 4 key steps
1: Identify record start ('>')
2: Read header
3: SWAR decode header info field
4: Read sequence bytes
Besides the original (naive) read method, the main difference between the three is how we read the sequence bytes (and quality scores if this was fastq). Especially how we remove new lines in sequence blocks of fasta.
orig
strip_newline
swar
read_once
Only passes over bytes once
Design of the Experiment
Input: 2.6G uncompressed fasta file
strip_newlineread_byteread_until@always_inline@no_inline@no_inline@always_inline@no_inline@no_inline@no_inline@no_inline@no_inline@no_inline@always_inline@always_inline@always_inline@no_inline@always_inline@always_inline@always_inline@no_inline)@always_inline@always_inline@always_inlineAll builds used the same
mojo build fastxpp_bench.mojoinvocation and were measured with Hyperfine--warmup 3 -r 10on an otherwise idle machine.Results Snapshot
orig(s)strip_newline(s)swar(s)read_once(s)orig@no_inline@no_inline, helpers@no_inline@no_inline, helpers@always_inline@no_inline, helpers@always_inlineOrdering Sensitivity
The last entry in the bench list is most sensitive to
@no_inline. In-lining the read bytes functions eliminates most of the difference besides compiling separately.Summary
@no_inlineon all bench functions to freeze harness behavior.read_byteandread_untilbecause they are hot in both swar and read_once paths.readhelpers.var lcnt = (slen + (bpl - 2)) // (bpl - 1)
Next steps