Skip to content

Conversation

@juliangaal
Copy link

@juliangaal juliangaal commented Aug 8, 2024

Hi there,

after #2 I took the liberty to start working on csv export support

Current changes:

  • csv(stream) that matches print(stream)
  • get_summary (formerly get_summary_table) extended to support ResultFormat: TABLE, CSV, DB, JSON
  • result_csv implemented to print summary
    • file system operations later
    • exporting the "detailed" overview requires further discussion
    • currently includes info and summary table in the same csv string (default for printing currently), see comments below

Issues and Discussion

  • Please provide clang-format file. Formatting is tab-based and hard to adhere to, I did my best.
  • IMO placing units in the header makes exporting to different formats (CSV, JSON, DB) easier down the line. This way there is no need to write a function that does, e.g. table_percentage yet doesn't append "%". It also makes result file parsing much simpler.
    filename,function,line,calls,ae[1-99]%,ae[0-100]%,time ae[0-100],time a[0-100]
    ctrack.cpp,expensiveOperation,6,10,99.95%,99.95%,576.21 ms,576.21 ms
    
    would become
    filename,function,line,calls,ae[1-99]%,ae[0-100]%,time ae[0-100]ms,time a[0-100]ms
    ctrack.cpp,expensiveOperation,6,10,99.95,99.95,576.21,576.21
    
    Now, when parsing, there is no need to handle units explicitely
  • I think splitting into an 'info.csv' and 'results.csv' is necessary, because each table/csv has its own header.
  • How do you want to export the "details" table in the future? That part I haven't touched at all yet. By extending the headers to, e.g. min (fastest [0-1]%), min (center[1-99]%)
    +---------+---------+----------+----------+----------+-----------+-----------+----------+---------+---------+
    |   fastest[0-1]%   |                           center[1-99]%                           | slowest[99-100]%  |
    +---------+---------+----------+----------+----------+-----------+-----------+----------+---------+---------+
    |   min   |  mean   |   min    |   mean   |   med    |  time a   |  time ae  |   max    |  mean   |   max   |
    +---------+---------+----------+----------+----------+-----------+-----------+----------+---------+---------+
    | 0.00 ns | 0.00 ns | 51.12 ms | 57.14 ms | 58.00 ms | 571.36 ms | 571.36 ms | 59.01 ms | 0.00 ns | 0.00 ns |
    +---------+---------+----------+----------+----------+-----------+-----------+----------+---------+---------+
    

@juliangaal juliangaal mentioned this pull request Aug 8, 2024
@Compaile
Copy link
Owner

Compaile commented Aug 8, 2024

thanks, for the pr we will review it shortly.

one quick thing to add alredy.
Leaving the units only in the header wont be great for terminal ussage.

times can be ns can be us can be ms or seconds.
it will be hard to read if people have something like 10000000 ms in a slow code path.

but i aggree for csv it would be better so i'll think about it

@Compaile
Copy link
Owner

Hi @juliangaal ,
I like the general structure of the pr!. Here are some topics which you mentioned and my suggestions:
a) For the file-based export, the detailed overview can just be extra columns. At the end of the day, the summary overview is just a short version of the details table merged. So I would just add the extra columns for the missing entries (I can also take care of that if you want).
b) Regarding the units, I see your point. I still want to have the units in string (logfile) or std::cout export. So we could add that as a parameter overload. What unit should we use as default if we leave them out? ms? µs? Or is this a setting?
c) Can you explain why splitting into info and result.csv would be necessary?
d) I agree we should provide a clang-format file. Will add that to the todo.

@juliangaal
Copy link
Author

juliangaal commented Aug 13, 2024

a) sure, I would appreciate that
b) My latest commit implemented just that, for summary and detail. Regarding "leaving out the units", I would only leave them out of the data, but place the unit in the header? That way the user doesn't have to chose, and future self will not have to answer the question which unit was used during generation...
c) I guess this is partly personal preference, but I believe it makes parsing easier. Consider this (current csv output of get_summary):

Start,End,time total,time ctracked,time ctracked %  <-- info header
2024-08-13 11:53:16,2024-08-13 11:53:17,565.93,565.58,99.94  <-- info
filename,function,line,calls,ae[1-99]%,ae[0-100]%,time ae[0-100],time a[0-100] <-/ summary header
ctrack.cpp,expensiveOperation,6,10,99.94,99.94,565.58,565.58 <-- summary

Processing this with e.g. pandas means discarding the first two rows just to access the other (for benchmarking more useful) data. We could of course write the data "next to" each other in the csv file

@Compaile
Copy link
Owner

Here's the text with spelling corrections:
Perfect.
a) I will add it to your PR soon.
b)
Yes, we can put them in the header when we don't want them inside the rows. However, when you have a big project and tracked many functions, it will be hard to set a good unit value in the header.
As an example, let's say we have functions which need 50 seconds and some which need 100 μs.
If we choose μs as header, the 50 seconds entry would show 50000000000, which is hard to read. In CSV, I totally agree it makes sense. What default unit should we use for CSV (since it needs to be constant) for all entries? Milliseconds?
c) I prefer to just add some more rows and write it next to the other data. That's more common than having 2 CSV files.
So I would just add the info entries (Start, End, time total, time tracked, time tracked %) to the start or end of the CSV lines. Could you take care of this?

Then i will add a) and merge. Thanks for the nice idea and implementation

@juliangaal
Copy link
Author

regarding b) couldn't we just provide an option where the dev can pass, e.g. std::chrono::milliseconds?

In general, I am just a little confused about your goals with appending rows to the right of the csv file. E.g.

Start,End,time total,time ctracked,time ctracked %  <-- info header
2024-08-13 11:53:16,2024-08-13 11:53:17,565.93,565.58,99.94  <-- info
filename,function,line,calls,ae[1-99]%,ae[0-100]%,time ae[0-100],time a[0-100] <-/ summary header
ctrack.cpp,expensiveOperation,6,10,99.94,99.94,565.58,565.58 <-- summary

would have to be changed to

Start,End,time total,time ctracked,time ctracked, filename,function,line,calls,ae[1-99]%,ae[0-100]%,time ae[0-100],time a[0-100] 
2024-08-13 11:53:16,2024-08-13 11:53:17,565.93,565.58,99.94,,,,,,,,
,,,,,,,,ctrack.cpp,expensiveOperation,6,10,99.94,99.94,565.58,565.58 
,,,,,,,,ctrack.cpp,expensiveOperation2,6,10,99.94,99.94,565.58,565.58 
....

is this the format you are intending to provide?

@Compaile
Copy link
Owner

Sorry for the incredible long wait. We now allocated more dedicated ressources to ctrack. If you dont mind i would finish up this pr and implement the last missing things?

@Asperamanca
Copy link

Please consider an option to export without decimal separator. Different locales use different separators. Exporting numbers in such a unit that whole numbers are sufficient makes the output much easier to import. My guess is that CSV is targeted more for import into tools for further processing; therefore readability is not the major concern (if you want that, you have the nice ASCII tables).
If you insist on the decimal separator, please make it configurable.

@Compaile
Copy link
Owner

good point.
We will adjust that and finish the csv and json export very soon. Would you agree that ";" is the best default separator ? (+ make it configurable)

@Asperamanca
Copy link

The ";" sounds like a reasonable default. Cursory research on decimal and thousands separators (please don't use those on a CSV export, either!) reveals that commonly used characters are ,.'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants