Conversation
elshize
left a comment
There was a problem hiding this comment.
This is not entirely what I had in mind. I think the aggregate function should either be applied to everything or we should just not aggregate at all. Otherwise, it's just too complex to keep track of what is what.
I was also thinking we either summarize or extract. But I think it's fine to just use different streams, only then I would remove the option to summarize only and just always summarize.
Let's discuss this a little more.
@JMMackenzie do you think it makes sense to extract results after aggregation? Say, return min for each query instead of R results where R is num of runs?
If not, then maybe it's best to just always extract everything and always print out summary to stderr and maybe that summary will always be (a) no aggregate, (b) min aggregate, and (c) mean aggregate? I frankly see no use for max or median. What are your thoughts?
If it makes sense to aggregate the actual output data, then I think there should always be 1 aggregate applied to both data and summary, or no aggregate at all.
I think this could be reasonable, and I do think this was what I had initially envisaged. But I can see the benefit of extracing everything to stderr, and then also allowing a separate "aggregated" stream to either file or to stdout? I agree that if there is an aggregated stream being output, then a summary should also use that same aggregation. Maybe we can make a new "results" format where we dump both |
|
I would actually try to avoid introducing new formats, I'd like to keep it as simple as possible, and as unsurprising as possible as well. I think the most important is to give the user the raw data, which they can process however they want. I think we all agree on this part. Then, I would lean towards simplicity:
We will not support all types outputs from this tool but that's ok. This is why we give the user raw output. Given the above, I would suggest the following algorithm: You have a few choices for implementing this. One is having results as "rows", i.e., a vector of structs describing everything you need, including the query ID, and then The other would be to have results as nested vectors and then after aggregation you still get nested vectors, only each inner vector has one element.
|
|
I think that, for the case, the median could provide more robustness than mean; for example, by suppressing atypical cases or noise (mostly related to the maximum values across the runs). As for the max aggregation, I don't now if it is really useful (maybe to capture take the worst cases?), but it may ultimately not be representative; I just included it because its implementation required no additional effort. If it has no real usefulness, I think it should be removed. Regarding the methodology for printing data or a summary, I think it is useful to show the summary when extracting. In this case, if some of the metrics satisfy the user's needs, there is no need to run an external script (for that reason I think is useful to print all defined metrics when no aggregation/transformation is specified). Also, given that the query times are printed to the output, they can simply be exported using redirection ( |
That's fair. My main concern is making it too complex. How about we always extract all queries (user can process that data themselves) and always print all summaries? I really don't want to go the route of defining aggregate function that will only apply to one or the other. Just note that if you use stderr for summaries, we can't pipe it to another tool for transformation because redirect will capture the rest of the logs, so it will be purely informative. |
|
The behavior in which all runs (together with all summaries) are extracted occurs when aggregation is set to However, another experiment could be: "I want just to know what happens when all values are the minimum". Although I can obtain this value from the summary output, if I specify aggregation by min, it makes sense that the summary and the output adapt to that scenario, so I don’t need to implement an specific script to reprocess the output data (even though I understand such a script would be simple). This can be useful for quick experiments; if I want to understand the causes behind this value, because I can quickly analyze the file that already contains all the minimum values. In any case, I understand that this may introduce unnecessary complexity from a SRP perspective, and an intermediate option would be to remove the What do you think @elshize, @JMMackenzie? |
|
I personally think it's unnecessary but if you really want to only aggregate the summary, this needs to be explicitly named in a way it doesn't leave any doubt as to what it does. |
Maybe we should actually run a bunch of experiments and see if it actually matters? 😁 By the way, if we want the summary to aggregate, why not show them all? I guess output would be verbose... I am usually happy to report something sensible as the default though, so we could have a median or mean summary by default, and then a |
My understanding is that this is what @gustingonzalez suggests to do -- by default. I would say that if this is the default, then I see very little value in an additional filter to only show one. I think we can probably come up with a succinct way of printing them out; per aggregate function, we only need mean and quantiles: Though this is compounded by the fact that we can define multiple algorithms. I believe this was the reason initially to print JSON lines for the statistics, so that you can capture it and parse later. But this was before we had the I believe that anything going to stderr shouldn't really be for parsing. This is because we also print other unstructured logs, such as "warming up" or "performing queries", etc. These should be logs that tell user what is happening, and the data should go to stdout. If we want to print summaries to stderr, I think we should either always print them, or have two options: (1) print them, (2) don't print them (with one of them being default, and the other enabled with a flag). Having additional filter for median, mean, minimum, etc., is just distracting. All it does is save up a few lines of logs but I adds significantly more complexity. |
|
Hi guys, sorry, I've been a bit absent. If we don't want to use On the other hand, I agree with the idea that the filters are unnecessary. We can simply output everything together, and let users filter using an external tool. This keeps the script simple, as @elshize mentioned. |
|
@gustingonzalez What do you suggest being extracted with |
|
@elshize the idea of the Take into account that regardless of whether |
|
Ideally, I would prefer to have the data go to stdout but I'd be fine with that, especially because this is how it works now. I would rename it Then, we print the summaries (in JSON) to the stdout, and logging to stderr. The full measurements use no aggregations, just full results, multiple lines per query. The JSON summaries are printed for all supported aggregates (including none?), and can easily be extracted with something like I think this would keep it reasonably simple and flexible at the same time. Side note: I think Does the above sound good? |
|
I agree with the idea. I'll work in the changes. Just one additional comment: I think using just |
|
I personally think using |
|
@elshize, got it! |
|
Because we want to print multiple summaries (for different agg functions), we'll need to name it somehow to print in JSON: {"agg_per_query": "none", ...}
{"agg_per_query": "min", ...}Not sure if there's maybe a better name for that, I'm certainly open for suggestions. Summary is a different thing for me, the printed statistics is summary, and if we always print it, then we don't need to label it, but we may need to use that term in code, docs, or CLI help. |
592a782 to
3052d8f
Compare
|
Hi, guys, ready with the changes. One thing that hadn't been taken into account is that more than algorithm (query type) can be specified. Therefore, the changes now support specifying more than one output file (one for each query type specified). The following is an example of execution and its output: Let me know if this is OK or if any changes are needed. |
Why not just have an "algorithm" column in the output file? I would rather avoid multiple output files. First, I would say it's no more convenient, if not less convenient than having one. It's so easy to filter out with your dataframe framework of choice, or whatever one uses for crunching data. Furthermore, now you have to worry about ensuring that the number of algorithms is the same as the number of output files, which is just a headache. I would simply print a column header (are we printing it now or no?) and then values. We can keep it TSV. Regarding summaries, I think it's better to have something like this: "times": [
{"query_aggregation": "none", "mean": 6499.02, "q50": 1630, "q90": 20539, "q95": 31282, "q99": 46491},
{"query_aggregation": "min", "mean": 4257.8, "q50": 1111, "q90": 14786, "q95": 18986, "q99": 28139},
{"query_aggregation": "mean", "mean": 6498.68, "q50": 1703, "q90": 21923, "q95": 30309, "q99": 40328},
{"query_aggregation": "median", "mean": 6898.12, "q50": 1768, "q90": 22582, "q95": 33420, "q99": 46509},
{"query_aggregation": "max", "mean": 8341.13, "q50": 2181, "q90": 27297, "q95": 39730, "q99": 51367}
]I think the mapping |
tools/queries.cpp
Outdated
| } else { | ||
| std::sort(query_times.begin(), query_times.end()); | ||
| double avg = | ||
| // Print JSON summary |
There was a problem hiding this comment.
We should avoid formatting JSON by hand, we have a library for that in our deps already (#include <nlohmann/json.hpp>), which allows you to define JSON similar to a map, and then print it.
|
I left another comment for the code, but I'll need to come back to this later, just letting you know I have not gone through all of the code yet. |
|
@elshize, ready with the changes. One observation is that the JSON now is printed in an unordered way. Altough new versions of <nlohmann/json.hpp> includes an Below is an example of the current JSON output: |
|
It's a little unfortunate that we can't control the order but on the other hand, I don't think it's that crucial, especially with pretty-printing. Also, not sure if APIs guarantee it, but it looks like it's not so much unordered as lexicographically ordered. We might want to work on upgrading the dependency anyway, but I don't think it's necessary as part of this work. I think the JSON output you provided above is fine. One other thing we have to consider is that now we are printing potentially multiple JSON objects, each in multiple lines, so it's no longer in JSONL format. This may limit what out-of-the-box tools one can use to parse the output. I typically use Ultimately, I don't think this is a big issue. You can always use That said, we could address this as well. I see two options off the top of my head. One is to have a flag The other approach would be to print a single JSON and put all summaries in an array: {
"summaries": [
{
"algorithm": "or",
...
},
{
"algorithm": "and",
...
},
]
}But to be clear, I'm ok with leaving the output as is now. |
elshize
left a comment
There was a problem hiding this comment.
Leaving some more comments, but still haven't gone through the entire PR.
elshize
left a comment
There was a problem hiding this comment.
Ok, finished going through it, left some comments.
A general note:
I would discourage the nesting doll style where you keep passing slightly modified parameters down and the next function slightly moves forward the entire logic. It's usually much clearer if we break down our programs into sub-programs and stick to separation of concerns.
For example, extracting times should typically have nothing to do with printing or summarizing them, so the extracting function should not get the output stream at all.
If we break things down, they typically easier to reason about. There are many reasons for that, including: the functions take fewer parameters, we are forced to return some meaningful types, understanding a function in isolation is much easier than globally, etc. One particular thing we should strive for is to keep mutable state contained, and as much as we can try to have pure functions doing complex logic, so we can deterministically predict what happens based on parameters. Of course, benchmarks are not deterministic in what values they produce, but I'm talking about all the rest.
Note that this nested type of functions are quite common in this code base, especially in legacy code, but we should fight it and break away from it as much as possible.
|
Thank you for your review @elshize, I'll work on that. |
f15e3ab to
ab79582
Compare
@elshize, done! However, Codacy is now reporting some issues regarding the autogenerated documentation. |
7488b35 to
8f0dff6
Compare
|
Thanks, we can ignore those. I'll merge it. |
Key changes in this pull request:
--extractoption by--output, which now requires an explicit output file. Since more than one algorithm (query type) could be specified, the algorithm is now also printed in the TSV.op_perftest()behavior remains available when--outputis not specified.--runsoption to specify the number of runs to measure the query set (by default: 3). Note that this parameter excludes warmup.--algorithmparameter to accept multiple-a/--algorithmflags instead of colon-separated algorithms.none,min,mean,medianandmaxas aggregation types.op_perftest(), so the set of queries is evaluated independently in each run.