Skip to content

Conversation

ShadowCurse
Copy link
Contributor

@ShadowCurse ShadowCurse commented Sep 3, 2024

Changes

Block tests were using fio logs to calculate bandwidth. The issue is that the values in logs are in KB/sec and we were just summing them up, so getting a sum of averages. But what we want is just an average bandwidth for the test.
Now we use json output of fio which contains much more info in a nice format. This info already includes all bandwidth calculations. This change will also allow us to output other metrics in the future if we want.

Reason

More correct block performance tests

License Acceptance

By submitting this pull request, I confirm that my contribution is made under
the terms of the Apache 2.0 license. For more information on following Developer
Certificate of Origin and signing off your commits, please check
CONTRIBUTING.md.

PR Checklist

  • If a specific issue led to this PR, this PR closes the issue.
  • The description of changes is clear and encompassing.
  • Any required documentation changes (code and docs) are included in this
    PR.
  • API changes follow the Runbook for Firecracker API changes.
  • User-facing changes are mentioned in CHANGELOG.md.
  • All added/changed functionality is tested.
  • New TODOs link to an issue.
  • Commits meet
    contribution quality standards.

  • This functionality cannot be added in rust-vmm.

@ShadowCurse ShadowCurse self-assigned this Sep 3, 2024
@ShadowCurse ShadowCurse force-pushed the fio_json branch 3 times, most recently from fbd02c4 to 632dfb6 Compare September 3, 2024 14:07
@ShadowCurse ShadowCurse added Status: Awaiting review Indicates that a pull request is ready to be reviewed Type: Fix Indicates a fix to existing code Type: Performance labels Sep 3, 2024
@ShadowCurse ShadowCurse requested review from pb8o and roypat and removed request for roypat September 3, 2024 14:12
Copy link

codecov bot commented Sep 3, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 84.34%. Comparing base (2521805) to head (032beb5).

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #4780   +/-   ##
=======================================
  Coverage   84.34%   84.34%           
=======================================
  Files         249      249           
  Lines       27460    27460           
=======================================
  Hits        23160    23160           
  Misses       4300     4300           
Flag Coverage Δ
5.10-c5n.metal 84.56% <ø> (+<0.01%) ⬆️
5.10-m5n.metal 84.54% <ø> (ø)
5.10-m6a.metal 83.84% <ø> (+<0.01%) ⬆️
5.10-m6g.metal 80.91% <ø> (ø)
5.10-m6i.metal 84.54% <ø> (ø)
5.10-m7g.metal 80.91% <ø> (ø)
6.1-c5n.metal 84.56% <ø> (ø)
6.1-m5n.metal 84.54% <ø> (ø)
6.1-m6a.metal 83.83% <ø> (ø)
6.1-m6g.metal 80.91% <ø> (ø)
6.1-m6i.metal 84.54% <ø> (ø)
6.1-m7g.metal 80.91% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@roypat
Copy link
Contributor

roypat commented Sep 3, 2024

Can you elaborate a bit more on what the problem was here, and why the json format fixes it? I thought the fio logs contained "kilobytes send since the last log line", with log lines being emitted every second. What do you mean by them being "kb/s" and "wrong to sum up"?

Block tests were using fio logs to calculate bandwidth.
The issue is that the values in logs are in KB/sec and
we were just summing them up, so getting a sum of averages.
But what we want is just an average bandwidth for the test.
Now we use json output of fio which contains much more info
in a nice format. This info already includes all bandwidth
calculations. This change will also allow us to output
other metrics in the future if we want.

Signed-off-by: Egor Lazarchuk <[email protected]>
@ShadowCurse
Copy link
Contributor Author

Each line in the log contains value seen over the specified period of time (I assume this is an average over 1s because we specify to log every second). So each line contains KB/s (not just KB). We were summing these KB/s and were getting a sum of averages.
And if they were KB, then our tests were completely wrong, as were calculating the total amount of data send instead of the bandwidth.

Copy link
Contributor

@roypat roypat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see now. This won't work with our A/B-tests, because they require a data series to be emitted, not just a single data point.

I admittedly still don't quite understand what was wrong with the old approach. It gives you a data series where each entry is "number of bytes written/read since the last entry", with entries being emitted once per second. That's different from "kb/s", because its not "average bandwidth since the test started". So it's fine to sum up the data points at each second (and if it weren't, I don't see why your summing up of all datapoints wouldn't be just as problematic).

Edit (we commented at the same time 😔):

Each line in the log contains value seen over the specified period of time (I assume this is an average over 1s because we specify to log every second). So each line contains KB/s (not just KB). We were summing these KB/s and were getting a sum of averages.

Well, to compute KB/s, you first need to count the number of kilobytes you observe during some duration, and then divide this number by the duration. In this case, we count the number of kilobytes observed during one second, and then divide by 1. So "KB/s=KB", if you excuse the abuse of notation

@ShadowCurse ShadowCurse closed this Sep 3, 2024
@ShadowCurse ShadowCurse deleted the fio_json branch September 3, 2024 15:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Status: Awaiting review Indicates that a pull request is ready to be reviewed Type: Fix Indicates a fix to existing code Type: Performance

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants