Update benchmark submission instructions and criteria (#102)

siddharth-krishna · web-flow · commit 8acc585e26dd · 2025-02-18T21:11:43.000+02:00
diff --git a/docs/Criteria_and_instructions.md b/docs/Criteria_and_instructions.md
@@ -1,31 +1,40 @@
+## Goals for the benchmark set
+
+We encourage submission of benchmarks that help the project meet the following overall targets:
+
+1. A set of benchmarks that are diverse in terms of modelling frameworks that generated them, problem structure, and model features. For instance, we would like models that consider innovative technologies (e.g., electrolyzers, CO2 capture) or policy-driven constraints (e.g., on CO2 emissions). By "features" we mean the different kinds of energy planning problems that can be modeleld by the framework (e.g., capacity expansion, power system operations, resource adequacy).
+
+1. Benchmarks using model features that are implemented using MILP constraints, especially features other than unit commitment.
+
+1. Benchmarks that help open-source solver developers improve their solvers: benchmarks that can be solved rapidly (< 5 minutes) by Gurobi but are slow (~1 hour or higher) or fail when solved by an open-source solver.
+
 ## Criteria for the selection of benchmarks
 
 The Solver Benchmark project is open and encourages the community to submit benchmark problems. Please ensure that submissions adhere to the following criteria:
 
-1. Benchmarks must be in the `.lp` file format, that it suitable for providing to the solver directly as input (i.e., no further pre-processing must be necessary).
+1. Benchmarks must be in the `.lp` or `.mps` file formats, that are suitable for providing to the solver directly as input (i.e., no further pre-processing must be necessary). An advantage of using these formats is that they preserve [confidentiality of the model's input data](https://www.gams.com/48/docs/S_CONVERT.html?search=confidential) as they contain only mathemetical equations and it is near impossible to reconstruct the underlying energy specification and technological data.
 
 1. Benchmarks must be Linear Programming (LP) or Mixed Integer Linear Programming (MILP) problems. We do not currently accept other kinds of problems such as non-linear, or multi-objective problems.
 
-1. Benchmarks must be solvable using Gurobi in 1 hour or less on a machine with [TBD] 4 CPUs and 16 GB memory (e.g. a an `e2-standard-4` VM on Google Cloud).
-
 1. Benchmarks must be problems generated by bottom-up energy system models (see *Target modelling frameworks* below).
 
-1. If possible, benchmarks that have a "size" parameter (e.g. number of nodes, number of clusters) that can be varied in order to obtain the same benchmarks in multiple sizes: small, medium, large.
-
-We also encourage benchmarks to help the project meet the following overall targets:
+1. Benchmarks must be solvable in one of the following time limits, depending on the size category:
+    - Small: under 10 minutes HiGHS solving time
+    - Medium: under 1 hour HiGHS solving time
+    - Large / Real: under 10 hours Gurobi solving time
 
-1. A set of benchmarks that are diverse in terms of modelling frameworks that generated them, problem structure, and model features.
+    where HiGHS runtimes are measured with the latest solver versions on a machine with [TBD] 2 vCPUs and 8 GB memory (e.g. an `e2-standard-2` VM on Google Cloud) and Gurobi solving time is on a [TBD -- reasonable machine?].
 
-1. Benchmarks that help open-source solver developers improve their solvers: benchmarks that can be solved rapidly (< 5 minutes) by Gurobi but are slow (~1 hour or higher) when solved by an open-source solver.
+Whenever possible, we prefer benchmarks that can be generated in multiple "sizes" by varying the time scale (single-stage / multi-stage planning horizons), temporal resolution (hourly, daily, etc), or spatial resolution (number of regions / nodes).
 
 ## Instructions for submitting benchmarks
 
 The prefered and recommended approach for submission is to open a pull request to this repository that adds to the `benchmarks/<framework>/` folder:
 - Metadata (name, description, etc; see below) added to a YAML file `benchmarks/<framework>/metadata.yaml`, create this if it doesn't exist already
 - A configuration file that is used as an input to the modelling framework
-- A dockerfile that specifies the modelling framework version (preferably a commit hash), pinned versions of all dependencies, and a script to run the modelling framework and obtain the LP file given to the solver.
+- A dockerfile that specifies the modelling framework version (preferably a commit hash), pinned versions of all dependencies, and a script to run the modelling framework and obtain the LP/MPS file given to the solver.
     - For example, see the benchmarks in the `benchmarks/pypsa/` folder.
-- For non fully open-source modelling frameworks, where LP files cannot be reproduced automatically as above, we will accept LP files hosted on a public immutable file storage service such as Zenodo. In such cases, the metadata file and a script to download the benchmark (prefereably via a permalink) is sufficient.
+- For non fully open-source modelling frameworks, where LP/MPS files cannot be reproduced automatically as above, we will accept LP/MPS files hosted on a public immutable file storage service such as Zenodo. In such cases, the metadata file containing a URL to download the benchmark (prefereably via a permalink) is sufficient.
 
 ### Benchmark metadata
 
@@ -40,12 +49,10 @@ Please include along with each benchmark submission, the following metadata. Fur
 | **Technique** | LP | MILP |
 | **Kind of problem** | Infrastructure (capacity expansion) | Operational (dispatch only) | Other (please indicate) |
 | **Sectors** | Sector-coupled (power + heating, industry, transport) | Power sector |
-| **Time horizon** | Single-period | Multi-period (indicate n. of periods)) |
+| **Time horizon** | Single-period | Multi-period (indicate n. of periods) |
 | **Temporal resolution** | Hourly | 3 hourly | Daily | Yearly |
 | **Spatial resolution** | Single node / 2 nodes (indicate countries/regions) | Multi-nodal (10 $\div$ 20) (indicate countries/regions) |
 | **MILP features** | None | Unit commitment | Transmission expansion | Other (please indicate) |
-| **N. of constraints** | <100| 100-1'000| 1'000-10'000| 10'000-100'000| 100'000-1'000'000 | 1'000'000-10'000'000 |
-| **N. of variables** | <100| 100-1'000| 1'000-10'000| 10'000-100'000| 100'000-1'000'000 | 1'000'000-10'000'000 |
 
 For example, here is an entry in the `benchmarks/pypsa/metadata.yaml` file:
 
@@ -58,11 +65,11 @@ pypsa-eur-sec-2-lv1-3h:
   Kind of problem: Infrastructure
   Sectors: Sector-coupled (power + heating, biomass, industry, transport)
   Time horizon: Single period (1 year)
-  Temporal resolution: 3 hourly
-  Spatial resolution: 2 nodes (Italy)
   MILP features: None
-  N. of constraints: 393568
-  N. of variables': 390692
+  Sizes:
+  - URL: https://todo.todo/todo.lp
+    Temporal resolution: 3 hourly
+    Spatial resolution: 2 nodes
 ```
 
 ## Target modelling frameworks