Merge pull request #325 from harriscr/ch_wip_documentation

harriscr · web-flow · commit c33c8c8ca334 · 2025-02-06T09:33:34.000Z
Documentation for the post processing tools
diff --git a/post_processing/README.md b/post_processing/README.md
@@ -0,0 +1,39 @@
+# Post Processing of CBT results
+
+## Description
+A set of tools that can be used to post process the data from any run of CBT. It provides a report in github markdown,
+and optionally pdf, format that contains a set of hockey-stick curves generated from the CBT run.
+The tool set consists of three separate tools that can be run stand-alone. The eventual aim is to integrate the post
+processing into CBT once more benchmark types are supported.
+
+There are three components to the post processing which are:
+
+* [formatter](formatter/README.md)
+* [plotter](plotter/README.md)
+* [reports](reports/README.md)
+
+
+## Suppoted benchmark tools
+This list will be added to as extra benchmark tools are supported.
+* fio
+
+## Dependencies
+These post processing changes include some new dependencies to be run correctly
+
+### python dependencies
+The following python modules are dependencies for this work:
+* matplotlib
+* mdutils
+
+Both have been added to the requirements.txt file in the CBT project.
+
+### Dependencies for pdf report generation
+To generate a report in pdf format there are 2 additional requirements
+
+A working install of tex is required on the base operating system, which can be installed using the package manager.
+For Red Hat based OSes this can be achieved by running `yum install texlive`
+
+[Pandoc](https://pandoc.org/), which can be installed on most Linux distributions using the included package manager.
+For Red Hat based OSes use `yum install pandoc`
+
+The minimum pandoc level tested is `2.14.0.3` which is available for RHEL 9
diff --git a/post_processing/formatter/README.md b/post_processing/formatter/README.md
@@ -0,0 +1,54 @@
+# Formatter
+
+The formatter converts CBT output json files into the correct format for the rest of the post processing. It is
+a json file of the format:
+
+```
+{
+    <queue_depth>: {
+                    bandwidth_bytes: <value>
+                    blocksize: <value>
+                    io_bytes: <value>
+                    iops: <value>
+                    latency: <value>
+                    number_of_jobs: <value>
+                    percentage_reads: <value>
+                    percentage_writes: <value>
+                    runtime_seconds: <value>
+                    std_deviation: <value>
+                    total_ios: <value>
+    }
+    ...
+    <queue_depth_n> {
+
+    }
+    maximum_bandwidth: <value>
+    latency_at_max_bandwidth: <value>
+    maximum_iops: <value>
+    latency_at_max_iops: <value>
+}
+```
+A single file will be produced per block size used for the benchmark run.
+
+## Standalone script
+A wrapper script has been provided for the formatter
+```
+fio_common_output_wrapper.py --archive=<archive_directory>
+                             --results_file_root=<file_root>
+```
+where
+- `--archive` Required. the archive directory given to CBT for the benchmark run.
+- `--results_file_root` Optional. the name of the results file to process, without the extension. This defaults to `json_output`,
+which is the default for CBT runs, if not specified
+
+Full help text is provided by using `--help` with the script
+
+## Output
+A directory called `visualisation` will be created in the directory specified by `--archive` that contains all the processed files.
+There will be one file per blocksize used for the benchmark run.
+
+## Example
+
+```bash
+PYTHONPATH=/cbt /cbt/tools/fio_common_output_wrapper.py --archive="/tmp/ch_cbt_run" --results_file_root="ch_json_result"
+```
diff --git a/post_processing/plotter/README.md b/post_processing/plotter/README.md
@@ -0,0 +1,33 @@
+# Plotter
+Draws the hockey stick plots for a benchmark run from the data produced by the formatter. These are png files, with one
+plot produced per block size used.
+
+There is also a python class that will produce comparison plots of two or more different CBT runs for one or more block 
+sizes.
+Due to the tools used there are only 6 unique colours available for the plot lines, so it is recommended to limit the
+comparison to 6 or less files or directories.
+
+## Standalone script
+A wrapper script is only provided to produce comparison plots.
+```
+plot_comparison.py  --files=<comma_separated_list_of_files_to_compare>
+                    --directories=<comma_separated_list_of_directories_to_compare>
+                    --output_directory=<full_path_to_directory_to_store_plot>
+                    --labels="<comma_separated_list_of_labels>
+```
+where
+- `--output_directory` Required. The full path to a directory to store the plots. Will be created if it doesn't exist
+- `--files` Optional. A comma separated list of files to plot on a single axis
+- `--directories` Optional. A comma separated list of directories to plot. A single plot will be produced per blocksize
+- `--labels` Optional. Comma separated list of labels to use for the lines on the comparison plot, in the same order as 
+--file or --directories.
+
+One of `--files` or `--directories` must be provided.
+
+Full help text is provided by using `--help` with the script
+
+## Example
+
+```bash
+PYTHONPATH=/cbt /cbt/tools/plot_comparison.py --directories="/tmp/ch_cbt_main_run,/tmp/ch_cbt_sandbox_run" --output_directory="/tmp/main_sb_comparisons"
+```
diff --git a/post_processing/reports/README.md b/post_processing/reports/README.md
@@ -0,0 +1,62 @@
+# Reports
+
+Produces a report in github markdown, and optionally pdf format that includes a summary table and the relevant
+plots from the CBT run.
+
+## Output
+A report in github markdown format with a plots directory containing the required plots. The report and plots directory
+can be uploaded directly to github as-is and the links will be maintained.
+
+Optionally a report in pdf format can also be created.
+
+Due to the tools used there are only 6 unique colours available for the plot lines, so it is recommended to limit the
+comparison to 6 or less files or directories. During testing we found that more than four directories can start rendering
+the pdf report unreadable, so it is not recommended to create a pdf report to compare data from more than four 
+benchmark runs.
+
+## Standalone scripts
+There are actually 2 scripts provided as wrappers for the report generation:
+* generate_performance_report.py
+* generate_comparison_performance_report.py
+
+### generate_performance_report
+Creates a performance report for a single benchmark run. The results must first have had the formatter run on them.
+
+```
+generate_performance_report.py  --archive=<full_path_to_results_directory>
+                                --output_directory=<full_path_to_directory_to_store_report>
+                                --create_pdf
+```
+
+where:
+- `--archive` Required. The archive directory containing the files from the formatter
+- `--output_directory` Required. The directory to store the markdown report file and relevant plots.
+- `--create_pdf` Optional. Create a pdf report
+
+Full help text is provided by using `--help` with the scripts
+
+#### Example
+```bash
+PYTHONPATH=/cbt /cbt/tools/generate_performance_report.py --archive="/tmp/ch_cbt_main_run" --output_directory="/tmp/reports/main" --create_pdf
+```
+
+### generate_comparison_performance_report.py
+Creates a report comparing 2 or more benchmark runs. The report will only include plots and results for formatted files
+that are common in all the directories.
+
+```
+generate_comparison_performance_report.py --baseline=<full_path_to_archive_directory_to_use_as_baseline>
+                                          --archives=<full_path_to_results_directories_to_compare>
+                                          --output_directory=<full_path_to_directory_to_store_report>
+                                          --create_pdf
+```
+where 
+- `--baseline` Required. The full path to the baseline results for the comparison
+- `--archives` Required. A comma-separated list of directories containing results to compare to the baseline
+- `--output_directory` Required. The directory to store the markdown report file and relevant plots.
+- `--create_pdf` Optional. Create a pdf report
+
+#### Examples
+```bash
+PYTHONPATH=/cbt /cbt/tools/generate_comparison_performance_report.py --baseline="/tmp/ch_cbt_main_run" --archives="/tmp/ch_sandbox/" --output_directory="/tmp/reports/main" --create_pdf
+```