add SolverBenchmark tutorial

tmigot · abelsiqueira · commit 74d813f9b91c · 2023-02-17T19:05:35.000+01:00
diff --git a/tutorials/introduction-to-solverbenchmark/Project.toml b/tutorials/introduction-to-solverbenchmark/Project.toml
@@ -0,0 +1,14 @@
+[deps]
+DataFrames = "a93c6f00-e57d-5684-b7b6-d8193f3e46c0"
+Plots = "91a5bcdd-55d7-5caf-9e0b-520d859cae80"
+Printf = "de0858da-6303-5e67-8744-51eddeeeb8d7"
+PyPlot = "d330b81b-6aea-500a-939a-2ce795aea3ee"
+Random = "9a3f8284-a2c9-5f02-9a11-845980a1fd5c"
+SolverBenchmark = "581a75fa-a23a-52d0-a590-d6201de2218a"
+
+[compat]
+
+DataFrames = "1.3.4"
+Plots = "1.31.7"
+PyPlot = "2.10.0"
+SolverBenchmark = "0.5.3"
diff --git a/tutorials/introduction-to-solverbenchmark/index.jmd b/tutorials/introduction-to-solverbenchmark/index.jmd
@@ -0,0 +1,262 @@
+---
+title: "SolverBenchmark.jl tutorial"
+tags: ["solver", "benchmark", "profile", "latex"]
+author: "Abel S. Siqueira and Dominique Orban"
+---
+
+In this tutorial we illustrate the main uses of `SolverBenchmark`.
+
+First, let's create fake data. It is imperative that the data for each solver be stored
+in `DataFrame`s, and the collection of different solver must be stored in a dictionary of
+`Symbol` to `DataFrame`.
+
+In our examples we'll use the following data.
+
+```julia
+using DataFrames, Printf, Random
+
+Random.seed!(0)
+
+n = 10
+names = [:alpha, :beta, :gamma]
+stats = Dict(name => DataFrame(:id => 1:n,
+         :name => [@sprintf("prob%03d", i) for i = 1:n],
+         :status => map(x -> x < 0.75 ? :first_order : :failure, rand(n)),
+         :f => randn(n),
+         :t => 1e-3 .+ rand(n) * 1000,
+         :iter => rand(10:10:100, n),
+         :irrelevant => randn(n)) for name in names)
+```
+
+The data consists of a (fake) run of three solvers `alpha`, `beta` and `gamma`.
+Each solver has a column `id`, which is necessary for joining the solvers (names
+can be repeated), and columns `name`, `status`, `f`, `t` and `iter` corresponding to
+problem results. There is also a column `irrelevant` with extra information that will
+not be used to produce our benchmarks.
+
+Here are the statistics of solver `alpha`:
+
+```julia
+stats[:alpha]
+```
+
+## Tables
+
+The first thing we may want to do is produce a table for each solver. Notice that the
+solver result is already a DataFrame, so there are a few options available in other
+packages, as well as simply printing the DataFrame.
+Our concern here is two-fold: producing publication-ready LaTeX tables, and web-ready
+markdown tables.
+
+The simplest use is `pretty_stats(io, dataframe)`.
+By default, `io` is `stdout`:
+
+```julia
+using SolverBenchmark
+
+pretty_stats(stats[:alpha])
+```
+
+Printing is LaTeX format is achieved with `pretty_latex_stats`:
+
+```julia
+pretty_latex_stats(stats[:alpha])
+```
+
+Alternatively, you can print to a file.
+
+```julia
+open("alpha.tex", "w") do io
+  println(io, "\\documentclass[varwidth=20cm,crop=true]{standalone}")
+  println(io, "\\usepackage{longtable}[=v4.13]")
+  println(io, "\\begin{document}")
+  pretty_latex_stats(io, stats[:alpha])
+  println(io, "\\end{document}")
+end
+```
+
+```julia
+run(`latexmk -quiet -pdf alpha.tex`)
+run(`pdf2svg alpha.pdf alpha.svg`)
+```
+
+If only a subset of columns should be printed, the DataFrame should be indexed accordingly:
+
+```julia
+df = stats[:alpha]
+pretty_stats(df[!, [:name, :f, :t]])
+```
+
+Markdown tables may be generated by supplying the PrettyTables `tf` keyword argument to specify the table format:
+
+```julia
+pretty_stats(df[!, [:name, :f, :t]], tf=tf_markdown)
+```
+
+All values of `tf` accepted by PrettyTables may be used in SolverBenchmark.
+
+The `fmt_override` option overrides the formatting of a specific column.
+The argument should be a dictionary of `Symbol` to format strings, where the format string will be applied to each element of the column.
+
+The `hdr_override` changes the column headers.
+
+```julia
+fmt_override = Dict(:f => "%+10.3e",
+                    :t => "%08.2f")
+hdr_override = Dict(:name => "Name", :f => "f(x)", :t => "Time")
+pretty_stats(stdout,
+             df[!, [:name, :f, :t]],
+             col_formatters = fmt_override,
+             hdr_override = hdr_override)
+```
+
+While `col_formatters` is for simple format strings, the PrettyTables API lets us define more elaborate formatters in the form of functions:
+
+```julia
+fmt_override = Dict(:f => "%+10.3e",
+                    :t => "%08.2f")
+hdr_override = Dict(:name => "Name", :f => "f(x)", :t => "Time")
+pretty_stats(df[!, [:name, :f, :t]],
+             col_formatters = fmt_override,
+             hdr_override = hdr_override,
+             formatters = (v, i, j) -> begin
+               if j == 3  # t is the 3rd column
+                 vi = floor(Int, v)
+                 minutes = div(vi, 60)
+                 seconds = vi % 60
+                 micros = round(Int, 1e6 * (v - vi))
+                 @sprintf("%2dm %02ds %06dμs", minutes, seconds, micros)
+               else
+                 v
+               end
+             end)
+```
+
+See the [PrettyTables.jl documentation](https://ronisbr.github.io/PrettyTables.jl/stable/man/formatters/) for more information.
+
+When using LaTeX format, the output must be understood by LaTeX.
+By default, numerical data in the table is wrapped in inline math environments.
+But those math environments would interfere with our formatting of the time.
+Thus we must first disable them for the `time` column using `col_formatters`, and then apply the PrettyTables formatter as above:
+
+```julia
+fmt_override = Dict(:f => "%+10.3e",
+                    :t => "%08.2f")
+hdr_override = Dict(:name => "Name", :f => "f(x)", :t => "Time")
+open("alpha2.tex", "w") do io
+  println(io, "\\documentclass[varwidth=20cm,crop=true]{standalone}")
+  println(io, "\\usepackage{longtable}[=v4.13]")
+  println(io, "\\begin{document}")
+  pretty_latex_stats(io,
+                    df[!, [:name, :status, :f, :t, :iter]],
+                    col_formatters = Dict(:t => "%f"),  # disable default formatting of t
+                    formatters = (v,i,j) -> begin
+                      if j == 4
+                        xi = floor(Int, v)
+                        minutes = div(xi, 60)
+                        seconds = xi % 60
+                        micros = round(Int, 1e6 * (v - xi))
+                        @sprintf("\\(%2d\\)m \\(%02d\\)s \\(%06d \\mu\\)s", minutes, seconds, micros)
+                      else
+                        v
+                      end
+                  end)
+  println(io, "\\end{document}")
+end
+```
+
+```julia
+run(`latexmk -quiet -pdf alpha2.tex`)
+run(`pdf2svg alpha2.pdf alpha2.svg`)
+```
+
+### Joining tables
+
+In some occasions, instead of/in addition to showing individual results, we show
+a table with the result of multiple solvers.
+
+```julia
+df = join(stats, [:f, :t])
+pretty_stats(stdout, df)
+```
+
+The column `:id` is used as guide on where to join. In addition, we may have
+repeated columns between the solvers. We convery that information with argument `invariant_cols`.
+
+```julia
+df = join(stats, [:f, :t], invariant_cols=[:name])
+pretty_stats(stdout, df)
+```
+
+`join` also accepts `hdr_override` for changing the column name before appending
+`_solver`.
+
+```julia
+hdr_override = Dict(:name => "Name", :f => "f(x)", :t => "Time")
+df = join(stats, [:f, :t], invariant_cols=[:name], hdr_override=hdr_override)
+pretty_stats(stdout, df)
+```
+
+```julia
+hdr_override = Dict(:name => "Name", :f => "\\(f(x)\\)", :t => "Time")
+df = join(stats, [:f, :t], invariant_cols=[:name], hdr_override=hdr_override)
+open("alpha3.tex", "w") do io
+  println(io, "\\documentclass[varwidth=20cm,crop=true]{standalone}")
+  println(io, "\\usepackage{longtable}[=v4.13]")
+  println(io, "\\begin{document}")
+  pretty_latex_stats(io, df)
+  println(io, "\\end{document}")
+end
+```
+
+```julia
+run(`latexmk -quiet -pdf alpha3.tex`)
+run(`pdf2svg alpha3.pdf alpha3.svg`)
+```
+
+## Profiles
+
+Performance profiles are a comparison tool developed by [Dolan and
+Moré, 2002](https://link.springer.com/article/10.1007/s101070100263/) that takes into
+account the relative performance of a solver and whether it has achieved convergence for each
+problem. `SolverBenchmark.jl` uses
+[BenchmarkProfiles.jl](https://github.com/JuliaSmoothOptimizers/BenchmarkProfiles.jl)
+for generating performance profiles from the dictionary of `DataFrame`s.
+
+The basic usage is `performance_profile(stats, cost)`, where `cost` is a function
+applied to a `DataFrame` and returning a vector.
+
+```julia
+using Plots
+pyplot()
+
+p = performance_profile(stats, df -> df.t)
+```
+
+Notice that we used `df -> df.t` which corresponds to the column `:t` of the
+`DataFrame`s.
+This does not take into account that the solvers have failed for a few problems
+(according to column :status). The next profile takes that into account.
+
+```julia
+cost(df) = (df.status .!= :first_order) * Inf + df.t
+p = performance_profile(stats, cost)
+```
+
+### Profile wall
+
+Another profile function is `profile_solvers`, which creates a wall of performance
+profiles, accepting multiple costs and doing 1 vs 1 comparisons in addition to the
+traditional performance profile.
+
+```julia
+solved(df) = (df.status .== :first_order)
+costs = [df -> .!solved(df) * Inf + df.t, df -> .!solved(df) * Inf + df.iter]
+costnames = ["Time", "Iterations"]
+p = profile_solvers(stats, costs, costnames)
+```
+
+### Example of benchmark running 
+Here is a useful tutorial on how to use the benchmark with specific solver:
+[Run a benchmark with OptimizationProblems](https://juliasmoothoptimizers.github.io/OptimizationProblems.jl/dev/benchmark/)
+The tutorial covers how to use the problems from `OptimizationProblems` to run a benchmark for unconstrained optimization.