Skip to content

Categorize benchmarks by size in metadata and use in website filters#99

Merged
siddharth-krishna merged 13 commits intomainfrom
benchmark-size-update
Feb 19, 2025
Merged

Categorize benchmarks by size in metadata and use in website filters#99
siddharth-krishna merged 13 commits intomainfrom
benchmark-size-update

Conversation

@KristijanFaust-OET
Copy link
Contributor

@KristijanFaust-OET KristijanFaust-OET commented Feb 14, 2025

Sources:
Already done runs with a 10 minute timeout:
https://github.com/open-energy-transition/solver-benchmark/blob/main/results/benchmark_results.csv

Pypsa-eur-* runs with a timeout of aprox. ~18H:
pypsa-eur-benchmark-runs.csv

All other non benchmarks with an timeout of 2 hours:
rest-of-benchmark-runs.csv

@vercel
Copy link

vercel bot commented Feb 14, 2025

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
solver-benchmark ✅ Ready (Inspect) Visit Preview 💬 Add feedback Feb 19, 2025 1:47pm

Copy link
Member

@siddharth-krishna siddharth-krishna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot, Kristijan. If you get time today, could you please also update run_benchmarks.py and benchmarks/tests.yaml to use the new yaml schema of sizes/name? If not, no worries, I'll do it tomorrow.

@danielelerede-oet please can you double check the L and R categories proposed by this PR? Our current strategy is to only mark the problems that correspond to a realistic energy modelling exercise as R, and leave all other large problems as L. Thank you.

@danielelerede-oet
Copy link
Member

danielelerede-oet commented Feb 14, 2025

Hi @siddharth-krishna @KristijanFaust-OET here are my observations:

  • In PyPSA benchmarks I read that we're missing R-size benchmarks, but I remember they were excluded due to TO
  • genx-10_IEEE_9_bus_DC_OPF should be real for coherence with other genx problems having a lower number of nodes, though being classified as Real (I guess because of the n. of constraints/variables)
  • Suprised about powermodels benchmarks being all real, but there's probably a mistake in metadata with n. of constraints/variables (the JuMP-Highs solution for automatic computation of sizes is definitely needed)
  • tulipa benchmarks with size: 28-1h and size: 28-2.2h can be classified as real

@siddharth-krishna
Copy link
Member

Thanks for the helpful input, Daniele!

  • The TODO on the pypsa benchmarks I think is because we didn't yet have time to try the other sizes to see which ones would run in <10h. If you have some estimation on what's the smallest number of nodes and time resolution that could be reasonably used in a "real" modelling study that'd be helpful to narrow down the search!
  • I think it looks like Kristijan has used the number of variables constraints to define the real ones for genx/powermodels, right? If so, I think better to go with your estimation of "realistic", which I think was:
    • genx-10_IEEE_9_bus_DC_OPF
    • tulipa-1_EU_investment_simple (Temporal resolution: 2.2, Spatial resolution: 28)
    • tulipa-1_EU_investment_simple (Temporal resolution: 1, Spatial resolution: 28)

@KristijanFaust-OET
Copy link
Contributor Author

KristijanFaust-OET commented Feb 14, 2025

For the pypsa-eur model I went with what @danielelerede-oet wrote, and that is that we can consider the 10-3h and 10-1h realistic. But we literally don't have those model sizes in the list for those names.

I only managed to solve one powermodels in less than 2 hours and that is: pglib_opf_case162_ieee_dtc-162-NA.mps with 4210. All the others timeout after 2 hours, and that is why I marked them all as L.

Addressed the rest in: 58cd40d

You now have also all the sources I used for the categorization, in the PR description.

@siddharth-krishna
Copy link
Member

siddharth-krishna commented Feb 18, 2025

@jacek-oet this PR categorizes the benchmark instances into XS, S, M, L, R in the results/metadata.yaml file. Could we please update the nextjs website code to use the info from this metadata file for the filters, as opposed to computing it based on the Highs runtime?

Ideally for tomorrow it would be great if we could use the filters on the "Benchmark details" page to show them all the R size benchmarks:
image

Copy link
Member

@siddharth-krishna siddharth-krishna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, Jacek! It works for me, I'll merge this in and review the code later!

@siddharth-krishna siddharth-krishna changed the title Categorize benchmarks by size Categorize benchmarks by size in metadata and use in website filters Feb 19, 2025
@siddharth-krishna siddharth-krishna merged commit 68c5f0d into main Feb 19, 2025
3 checks passed
@siddharth-krishna siddharth-krishna deleted the benchmark-size-update branch February 19, 2025 13:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants