Print cost metrics as data size#11443
Print cost metrics as data size#11443kokosing wants to merge 1 commit intoprestodb:masterfrom kokosing:origin/master/025_cost_metrics
Conversation
|
More or less cost is the amount of data to be processed by cpu, sent through network, stored in memory. Displaying them in data size units makes it more readable. Especially when comparing big numbers. |
In case of CPU that would rather be something like number of cycles (e.g: artificial unit of measure representing how heavy computation is). That it is currently more or less amount of data processed is a limitation of the model. We should put different weights on different operations (e.g: hash computation, hash lookup, etc).. @rschlussel |
presto-main/src/main/java/com/facebook/presto/sql/planner/planPrinter/PlanPrinter.java
Outdated
Show resolved
Hide resolved
|
Output example: |
|
Currently (without using units) the above example looks like: Thanks to this PR the EXPLAIN output is much more readable. |
There was a problem hiding this comment.
redundant else (here & below)
There was a problem hiding this comment.
why uppercase?
FWIW, Double#toString outputs -Infinity / Infinity
There was a problem hiding this comment.
revert if condition -- NaN is yet another special case, like +inf, -inf, so layout the conditions for these cases similarily
There was a problem hiding this comment.
If you want to say "strip trailing be", formattedValue.replaceAll("b$", "") would be more direct way of saying that.
There was a problem hiding this comment.
all comments from formatDoubleAsCpuCost
findepi
left a comment
There was a problem hiding this comment.
- % prev comments
- squash commits
|
I don’t think using data size is correct for CPU cost. Billions should be “B” not “G”. See how we print row count in the CLI. |
|
Why not use a Duration for cpu cost. It’s a measurement of time. |
|
Estimated CPU cost is not time. Currently, it's (very roughly) the amount of data being processed. |
|
That’s very misleading, then. CPU cost should, ideally, be a an estimation of the amount of CPU (as measured by cpu timers) the query will use. If the current metric is an indication of something else, we should come up with a different name for it. |
|
@findepi, CPU time is no longer an estimate. With the changes @arhimondr and I made, we get an actual measurement per operator. So, I think time is the right measure. |
|
@dain i understood this as being about "CPU cost estimate" computed by CBO during planning rather than "CPU time measurement" as measured (or perviously approximated) during execution. |
|
@dain which changes you refer to? |
|
@sopel39 I think this is the core PR #11408, but there were a few more followup ones. @findepi I see, I thought we were talking about the actual measurements. What is the CBO CPU part actually estimating? Specifically, is it estimating the actual CPU time in the current cluster, or is it more of a estimate in a "model" cluster? |
|
@dain currently this is an "abstract CPU cost". Hence the units are not time/ticks. I am for the change @kokosing is proposing. It's trivial and should help reading EXPLAINS today. Can we merge this as is then? |
|
My comment about the prefix has not been addressed. It’s only a number, so we should use “B” for billion, not “G”. Metric prefixes only make sense for a unit of some type. |
|
@electrum I am going to address your comment, by fixing airlift/units#7 |
|
Related (dependency) PR: airlift/units#8 |
|
@kokosing Grzegorz, I can't look right now, because I'm over-booked. I'm on-call and I'm also finishing the FB-specific parts of the 0.216 release. I'll look into this next week. |
nezihyigitbasi
left a comment
There was a problem hiding this comment.
LGTM.
- nit: how about we update the commit title as
Update the format of costs in plan output to have unitsto be more specific about the change? - Is this blocked on airlift/units#8 as commented in trinodb/trino#68? Looks like it's not.
| return "?"; | ||
| } | ||
|
|
||
| return DataSize.succinctDataSize(value, BYTE).toString(); |
There was a problem hiding this comment.
- static import
succinctDataSize - you can also use
succinctBytes((long) value)
| output.append('\n'); | ||
| } | ||
|
|
||
| private static String formatDoubleAsCpuCost(double value) |
There was a problem hiding this comment.
I think we can simply rename these methods as formatCpuCost and formatDataSize, because the parameters are double so we don't need to repeat that we are formatting double values.
| if (!isFinite(value)) { | ||
| return Double.toString(value); | ||
| } | ||
| else if (isNaN(value)) { |
Print cost metrics as data size