From d7bc595a67a94015eb2dd30fbbbb60c15b5012ac Mon Sep 17 00:00:00 2001
From: Siddharth Krishna <siddharth-krishna@users.noreply.github.com>
Date: Sat, 21 Dec 2024 16:51:24 +0530
Subject: [PATCH 1/6] Update benchmark criteria and instructions

---
 docs/Criteria_and_instructions.md | 42 ++++++++++++++++++-------------
 1 file changed, 25 insertions(+), 17 deletions(-)

diff --git a/docs/Criteria_and_instructions.md b/docs/Criteria_and_instructions.md
index 06efd9cd..552182be 100644
--- a/docs/Criteria_and_instructions.md
+++ b/docs/Criteria_and_instructions.md
@@ -1,31 +1,37 @@
-## Criteria for the selection of benchmarks
+## Goals for the benchmark set
 
-The Solver Benchmark project is open and encourages the community to submit benchmark problems. Please ensure that submissions adhere to the following criteria:
+We encourage submission of benchmarks that help the project meet the following overall targets:
 
-1. Benchmarks must be in the `.lp` file format, that it suitable for providing to the solver directly as input (i.e., no further pre-processing must be necessary).
+1. A set of benchmarks that are diverse in terms of modelling frameworks that generated them, problem structure, and model features.
 
-1. Benchmarks must be Linear Programming (LP) or Mixed Integer Linear Programming (MILP) problems. We do not currently accept other kinds of problems such as non-linear, or multi-objective problems.
+1. Benchmarks using model features that are implemented using MILP constraints.
 
-1. Benchmarks must be solvable using Gurobi in 1 hour or less on a machine with [TBD] 4 CPUs and 16 GB memory (e.g. a an `e2-standard-4` VM on Google Cloud).
+1. Benchmarks that help open-source solver developers improve their solvers: benchmarks that can be solved rapidly (< 5 minutes) by Gurobi but are slow (~1 hour or higher) or fail when solved by an open-source solver.
 
-1. Benchmarks must be problems generated by bottom-up energy system models (see *Target modelling frameworks* below).
+## Criteria for the selection of benchmarks
 
-1. If possible, benchmarks that have a "size" parameter (e.g. number of nodes, number of clusters) that can be varied in order to obtain the same benchmarks in multiple sizes: small, medium, large.
+The Solver Benchmark project is open and encourages the community to submit benchmark problems. Please ensure that submissions adhere to the following criteria:
 
-We also encourage benchmarks to help the project meet the following overall targets:
+1. Benchmarks must be in the `.lp` or `.mps` file formats, that are suitable for providing to the solver directly as input (i.e., no further pre-processing must be necessary).
 
-1. A set of benchmarks that are diverse in terms of modelling frameworks that generated them, problem structure, and model features.
+1. Benchmarks must be Linear Programming (LP) or Mixed Integer Linear Programming (MILP) problems. We do not currently accept other kinds of problems such as non-linear, or multi-objective problems.
+
+1. Benchmarks must be problems generated by bottom-up energy system models (see *Target modelling frameworks* below).
 
-1. Benchmarks that help open-source solver developers improve their solvers: benchmarks that can be solved rapidly (< 5 minutes) by Gurobi but are slow (~1 hour or higher) when solved by an open-source solver.
+1. Benchmarks must be solvable in one of the following time limits, depending on the size category:
+  - Small: under 10 minutes HiGHS solving time
+  - Medium: under 1 hour HiGHS solving time
+  - Large / Real: under 10 hours Gurobi solving time
+  where all runtimes are measured with the latest solver versions on a machine with [TBD] 2 vCPUs and 8 GB memory (e.g. an `e2-standard-2` VM on Google Cloud). If possible, we prefer benchmark generation scripts that have a "size" parameter (e.g. number of nodes, number of clusters) that can be varied in order to obtain the same benchmarks in multiple sizes.
 
 ## Instructions for submitting benchmarks
 
 The prefered and recommended approach for submission is to open a pull request to this repository that adds to the `benchmarks/<framework>/` folder:
 - Metadata (name, description, etc; see below) added to a YAML file `benchmarks/<framework>/metadata.yaml`, create this if it doesn't exist already
 - A configuration file that is used as an input to the modelling framework
-- A dockerfile that specifies the modelling framework version (preferably a commit hash), pinned versions of all dependencies, and a script to run the modelling framework and obtain the LP file given to the solver.
+- A dockerfile that specifies the modelling framework version (preferably a commit hash), pinned versions of all dependencies, and a script to run the modelling framework and obtain the LP/MPS file given to the solver.
     - For example, see the benchmarks in the `benchmarks/pypsa/` folder.
-- For non fully open-source modelling frameworks, where LP files cannot be reproduced automatically as above, we will accept LP files hosted on a public immutable file storage service such as Zenodo. In such cases, the metadata file and a script to download the benchmark (prefereably via a permalink) is sufficient.
+- For non fully open-source modelling frameworks, where LP/MPS files cannot be reproduced automatically as above, we will accept LP/MPS files hosted on a public immutable file storage service such as Zenodo. In such cases, the metadata file containing a URL to download the benchmark (prefereably via a permalink) is sufficient.
 
 ### Benchmark metadata
 
@@ -40,7 +46,7 @@ Please include along with each benchmark submission, the following metadata. Fur
 | **Technique** | LP | MILP |
 | **Kind of problem** | Infrastructure (capacity expansion) | Operational (dispatch only) | Other (please indicate) |
 | **Sectors** | Sector-coupled (power + heating, industry, transport) | Power sector |
-| **Time horizon** | Single-period | Multi-period (indicate n. of periods)) |
+| **Time horizon** | Single-period | Multi-period (indicate n. of periods) |
 | **Temporal resolution** | Hourly | 3 hourly | Daily | Yearly |
 | **Spatial resolution** | Single node / 2 nodes (indicate countries/regions) | Multi-nodal (10 $\div$ 20) (indicate countries/regions) |
 | **MILP features** | None | Unit commitment | Transmission expansion | Other (please indicate) |
@@ -58,11 +64,13 @@ pypsa-eur-sec-2-lv1-3h:
   Kind of problem: Infrastructure
   Sectors: Sector-coupled (power + heating, biomass, industry, transport)
   Time horizon: Single period (1 year)
-  Temporal resolution: 3 hourly
-  Spatial resolution: 2 nodes (Italy)
   MILP features: None
-  N. of constraints: 393568
-  N. of variables': 390692
+  Sizes:
+  - URL: https://todo.todo/todo.lp
+    Temporal resolution: 3
+    Spatial resolution: 2
+    N. of constraints: 393568
+    N. of variables': 390692
 ```
 
 ## Target modelling frameworks

From 9808344f1aee75cc88c241021f25de5108a9bf58 Mon Sep 17 00:00:00 2001
From: Siddharth Krishna <siddharth-krishna@users.noreply.github.com>
Date: Sat, 21 Dec 2024 17:00:32 +0530
Subject: [PATCH 2/6] Fix

---
 docs/Criteria_and_instructions.md | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/docs/Criteria_and_instructions.md b/docs/Criteria_and_instructions.md
index 552182be..2d3513e0 100644
--- a/docs/Criteria_and_instructions.md
+++ b/docs/Criteria_and_instructions.md
@@ -4,7 +4,7 @@ We encourage submission of benchmarks that help the project meet the following o
 
 1. A set of benchmarks that are diverse in terms of modelling frameworks that generated them, problem structure, and model features.
 
-1. Benchmarks using model features that are implemented using MILP constraints.
+1. Benchmarks using model features that are implemented using MILP constraints, preferably other than unit commitment.
 
 1. Benchmarks that help open-source solver developers improve their solvers: benchmarks that can be solved rapidly (< 5 minutes) by Gurobi but are slow (~1 hour or higher) or fail when solved by an open-source solver.
 
@@ -22,6 +22,7 @@ The Solver Benchmark project is open and encourages the community to submit benc
   - Small: under 10 minutes HiGHS solving time
   - Medium: under 1 hour HiGHS solving time
   - Large / Real: under 10 hours Gurobi solving time
+
   where all runtimes are measured with the latest solver versions on a machine with [TBD] 2 vCPUs and 8 GB memory (e.g. an `e2-standard-2` VM on Google Cloud). If possible, we prefer benchmark generation scripts that have a "size" parameter (e.g. number of nodes, number of clusters) that can be varied in order to obtain the same benchmarks in multiple sizes.
 
 ## Instructions for submitting benchmarks

From be90ad52a9a002d5addd5f7582db1fb141b7d24f Mon Sep 17 00:00:00 2001
From: Siddharth Krishna <siddharth-krishna@users.noreply.github.com>
Date: Sat, 21 Dec 2024 17:01:01 +0530
Subject: [PATCH 3/6] Fix

---
 docs/Criteria_and_instructions.md | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/docs/Criteria_and_instructions.md b/docs/Criteria_and_instructions.md
index 2d3513e0..bc597350 100644
--- a/docs/Criteria_and_instructions.md
+++ b/docs/Criteria_and_instructions.md
@@ -19,11 +19,11 @@ The Solver Benchmark project is open and encourages the community to submit benc
 1. Benchmarks must be problems generated by bottom-up energy system models (see *Target modelling frameworks* below).
 
 1. Benchmarks must be solvable in one of the following time limits, depending on the size category:
-  - Small: under 10 minutes HiGHS solving time
-  - Medium: under 1 hour HiGHS solving time
-  - Large / Real: under 10 hours Gurobi solving time
+    - Small: under 10 minutes HiGHS solving time
+    - Medium: under 1 hour HiGHS solving time
+    - Large / Real: under 10 hours Gurobi solving time
 
-  where all runtimes are measured with the latest solver versions on a machine with [TBD] 2 vCPUs and 8 GB memory (e.g. an `e2-standard-2` VM on Google Cloud). If possible, we prefer benchmark generation scripts that have a "size" parameter (e.g. number of nodes, number of clusters) that can be varied in order to obtain the same benchmarks in multiple sizes.
+    where all runtimes are measured with the latest solver versions on a machine with [TBD] 2 vCPUs and 8 GB memory (e.g. an `e2-standard-2` VM on Google Cloud). If possible, we prefer benchmark generation scripts that have a "size" parameter (e.g. number of nodes, number of clusters) that can be varied in order to obtain the same benchmarks in multiple sizes.
 
 ## Instructions for submitting benchmarks
 

From 656af9c8e638093dd77ab50c36b2acbdae63b4af Mon Sep 17 00:00:00 2001
From: Siddharth Krishna <siddharth-krishna@users.noreply.github.com>
Date: Wed, 5 Feb 2025 19:05:44 +0530
Subject: [PATCH 4/6] Update goals and definition of sizes

---
 docs/Criteria_and_instructions.md | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/docs/Criteria_and_instructions.md b/docs/Criteria_and_instructions.md
index bc597350..4cbaf6d8 100644
--- a/docs/Criteria_and_instructions.md
+++ b/docs/Criteria_and_instructions.md
@@ -2,9 +2,10 @@
 
 We encourage submission of benchmarks that help the project meet the following overall targets:
 
-1. A set of benchmarks that are diverse in terms of modelling frameworks that generated them, problem structure, and model features.
+1. A set of benchmarks that are diverse in terms of modelling frameworks that generated them, problem structure, and model features. By features, we mean e.g., models that consider innovative technologies (e.g., electrolyzers, CO2
+capture) or policy-driven constraints (e.g., on CO2 emissions).
 
-1. Benchmarks using model features that are implemented using MILP constraints, preferably other than unit commitment.
+1. Benchmarks using model features that are implemented using MILP constraints, especially features other than unit commitment.
 
 1. Benchmarks that help open-source solver developers improve their solvers: benchmarks that can be solved rapidly (< 5 minutes) by Gurobi but are slow (~1 hour or higher) or fail when solved by an open-source solver.
 
@@ -23,7 +24,9 @@ The Solver Benchmark project is open and encourages the community to submit benc
     - Medium: under 1 hour HiGHS solving time
     - Large / Real: under 10 hours Gurobi solving time
 
-    where all runtimes are measured with the latest solver versions on a machine with [TBD] 2 vCPUs and 8 GB memory (e.g. an `e2-standard-2` VM on Google Cloud). If possible, we prefer benchmark generation scripts that have a "size" parameter (e.g. number of nodes, number of clusters) that can be varied in order to obtain the same benchmarks in multiple sizes.
+    where HiGHS runtimes are measured with the latest solver versions on a machine with [TBD] 2 vCPUs and 8 GB memory (e.g. an `e2-standard-2` VM on Google Cloud) and Gurobi solving time is on a [TBD -- reasonable machine?].
+
+Whenever possible, we prefer benchmarks that can be generated in multiple "sizes" by varying the time scale (single-stage / multi-stage planning horizons), temporal resolution (hourly, daily, etc), or spatial resolution (number of regions / nodes).
 
 ## Instructions for submitting benchmarks
 

From a25733429cc0eae19f3b36727dfdf8aa9dec642f Mon Sep 17 00:00:00 2001
From: Siddharth Krishna <siddharth-krishna@users.noreply.github.com>
Date: Thu, 6 Feb 2025 10:15:41 +0530
Subject: [PATCH 5/6] Remove number of variables/constraints as this can be
 calculated automatically

---
 docs/Criteria_and_instructions.md | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/docs/Criteria_and_instructions.md b/docs/Criteria_and_instructions.md
index 4cbaf6d8..fdb5c2fa 100644
--- a/docs/Criteria_and_instructions.md
+++ b/docs/Criteria_and_instructions.md
@@ -54,8 +54,6 @@ Please include along with each benchmark submission, the following metadata. Fur
 | **Temporal resolution** | Hourly | 3 hourly | Daily | Yearly |
 | **Spatial resolution** | Single node / 2 nodes (indicate countries/regions) | Multi-nodal (10 $\div$ 20) (indicate countries/regions) |
 | **MILP features** | None | Unit commitment | Transmission expansion | Other (please indicate) |
-| **N. of constraints** | <100| 100-1'000| 1'000-10'000| 10'000-100'000| 100'000-1'000'000 | 1'000'000-10'000'000 |
-| **N. of variables** | <100| 100-1'000| 1'000-10'000| 10'000-100'000| 100'000-1'000'000 | 1'000'000-10'000'000 |
 
 For example, here is an entry in the `benchmarks/pypsa/metadata.yaml` file:
 
@@ -73,8 +71,6 @@ pypsa-eur-sec-2-lv1-3h:
   - URL: https://todo.todo/todo.lp
     Temporal resolution: 3
     Spatial resolution: 2
-    N. of constraints: 393568
-    N. of variables': 390692
 ```
 
 ## Target modelling frameworks

From d173cf20c434bdf6d397df5471c0136abce5bd35 Mon Sep 17 00:00:00 2001
From: Siddharth Krishna <siddharth-krishna@users.noreply.github.com>
Date: Tue, 18 Feb 2025 20:42:04 +0200
Subject: [PATCH 6/6] PR comments

---
 docs/Criteria_and_instructions.md | 9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/docs/Criteria_and_instructions.md b/docs/Criteria_and_instructions.md
index fdb5c2fa..6852ac30 100644
--- a/docs/Criteria_and_instructions.md
+++ b/docs/Criteria_and_instructions.md
@@ -2,8 +2,7 @@
 
 We encourage submission of benchmarks that help the project meet the following overall targets:
 
-1. A set of benchmarks that are diverse in terms of modelling frameworks that generated them, problem structure, and model features. By features, we mean e.g., models that consider innovative technologies (e.g., electrolyzers, CO2
-capture) or policy-driven constraints (e.g., on CO2 emissions).
+1. A set of benchmarks that are diverse in terms of modelling frameworks that generated them, problem structure, and model features. For instance, we would like models that consider innovative technologies (e.g., electrolyzers, CO2 capture) or policy-driven constraints (e.g., on CO2 emissions). By "features" we mean the different kinds of energy planning problems that can be modeleld by the framework (e.g., capacity expansion, power system operations, resource adequacy).
 
 1. Benchmarks using model features that are implemented using MILP constraints, especially features other than unit commitment.
 
@@ -13,7 +12,7 @@ capture) or policy-driven constraints (e.g., on CO2 emissions).
 
 The Solver Benchmark project is open and encourages the community to submit benchmark problems. Please ensure that submissions adhere to the following criteria:
 
-1. Benchmarks must be in the `.lp` or `.mps` file formats, that are suitable for providing to the solver directly as input (i.e., no further pre-processing must be necessary).
+1. Benchmarks must be in the `.lp` or `.mps` file formats, that are suitable for providing to the solver directly as input (i.e., no further pre-processing must be necessary). An advantage of using these formats is that they preserve [confidentiality of the model's input data](https://www.gams.com/48/docs/S_CONVERT.html?search=confidential) as they contain only mathemetical equations and it is near impossible to reconstruct the underlying energy specification and technological data.
 
 1. Benchmarks must be Linear Programming (LP) or Mixed Integer Linear Programming (MILP) problems. We do not currently accept other kinds of problems such as non-linear, or multi-objective problems.
 
@@ -69,8 +68,8 @@ pypsa-eur-sec-2-lv1-3h:
   MILP features: None
   Sizes:
   - URL: https://todo.todo/todo.lp
-    Temporal resolution: 3
-    Spatial resolution: 2
+    Temporal resolution: 3 hourly
+    Spatial resolution: 2 nodes
 ```
 
 ## Target modelling frameworks