Website: Implement a check for differing benchmark results and metadata by jacek-oet · Pull Request #144 · open-energy-transition/solver-benchmark

jacek-oet · 2025-04-07T16:04:59Z

No description provided.

vercel · 2025-04-07T16:05:03Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Comments	Updated (UTC)
solver-benchmark	✅ Ready (Inspect)	Visit Preview	💬 Add feedback	Apr 9, 2025 8:24am

jacek-oet · 2025-04-07T16:14:36Z

@siddharth-krishna CI is failing now because there are some benchmarks that appear in benchmark-result.csv but not in result/meta-data.yaml.

Validation failed for benchmark: pypsa-eur-sec with size: 3-24h
Validation failed for benchmark: pypsa-eur-sec with size: 4-24h
Validation failed for benchmark: pypsa-eur-sec with size: 2-12h
Validation failed for benchmark: pypsa-eur-sec with size: 5-24h
Validation failed for benchmark: pypsa-eur-sec with size: 7-24h
Validation failed for benchmark: pypsa-eur-sec with size: 8-24h
Validation failed for benchmark: pypsa-eur-sec with size: 3-12h
Validation failed for benchmark: pypsa-eur-sec with size: 9-24h
Validation failed for benchmark: pypsa-eur-sec with size: 10-24h
Validation failed for benchmark: pypsa-eur-sec with size: 4-12h
Validation failed for benchmark: pypsa-eur-elec-trex with size: 2-24h
Validation failed for benchmark: pypsa-eur-elec-trex with size: 3-24h
Validation failed for benchmark: pypsa-eur-elec-trex with size: 4-24h
Validation failed for benchmark: pypsa-eur-elec-trex with size: 2-12h
Validation failed for benchmark: pypsa-eur-elec-trex with size: 5-24h
Validation failed for benchmark: pypsa-eur-elec-trex with size: 6-24h
Validation failed for benchmark: pypsa-eur-elec-trex with size: 7-24h
Validation failed for benchmark: pypsa-eur-elec-trex with size: 8-24h
Validation failed for benchmark: pypsa-eur-elec-trex with size: 4-12h
Validation failed for benchmark: pypsa-eur-elec-trex with size: 10-24h
Validation failed for benchmark: pypsa-eur-elec-trex with size: 5-12h
Validation failed for benchmark: pypsa-eur-elec-trex with size: 9-24h
....

I think meta-data.yaml should be updated, or we should remove these missing benchmarks from benchmark-result.csv.

siddharth-krishna · 2025-04-08T07:23:13Z

Ah yes, nice catch. These are old results (which we are using just for development), I will do a new benchmark run this week that will only run things in the metadata.yaml file so you're right, we should remove the extra ones from the results CSV. Could you do that please? Thanks

…nction

siddharth-krishna

Sorry Jacek, it looks like there are still some mistakes in which benchmarks were removed. If you don't mind, let me do this quickly and push to your branch.

siddharth-krishna · 2025-04-09T07:43:20Z

results/benchmark_results.csv

 Sienna_modified_RTS_GMLC_DA_sys_NetPTDF_Horizon12_Day29,1-1h,glpk,5.0,2020,TO,Timeout,600,132.948,,,
 Sienna_modified_RTS_GMLC_DA_sys_NetPTDF_Horizon12_Day314,1-1h,glpk,5.0,2020,TO,Timeout,600,132.74,,,
-Sienna_modified_RTS_GMLC_DA_sys_NetPTDF_Horizon12_Day332,1-1h,glpk,5.0,2020,TO,Timeout,600,135.172,,,
+Sienna_modified_RTS_GMLC_DA_sys_NetPTDF_Horizon12_Day332,1-1h,glpk,5.0,2020,unknown,unknown,1.9189174175262451,150.836,0.0,,


Hmm I didn't expect any results to be modified -- was this intentional? If not, could we revert the modifications and ensure that the diff only deletes rows corresponding to sizes that were removed from results/metadata.yaml? Thanks

siddharth-krishna · 2025-04-09T07:58:29Z

results/benchmark_results.csv

-Sienna_modified_RTS_GMLC_DA_sys_NetTransport_Horizon48_Day332,1-1h,highs,1.8.1,2024,TO,Timeout,600,625.368,,,
-Sienna_modified_RTS_GMLC_DA_sys_NetTransport_Horizon48_Day332,1-1h,scip,9.1.1,2024,TO,Timeout,600,961.792,,,
-genx-1_three_zones,3-1h,highs,1.8.1,2024,TO,Timeout,600,1784.696,,,
-genx-1_three_zones,3-1h,scip,9.1.1,2024,TO,Timeout,600,1320.512,,,


It feels like too many rows have been deleted...

siddharth-krishna · 2025-04-09T08:27:22Z

For the record, I used this script to figure out which benchmark results to keep:

import yaml
meta = yaml.safe_load(open("results/metadata.yaml"))
benchs = set()
for b, d in meta['benchmarks'].items():
    for s in d['Sizes']:
        benchs.add((b, s['Name']))

results = open('results/benchmark_results.csv').readlines()
results1 = []
for line in results:
    parts = line.split(',')
    if parts[0] == 'Benchmark' or (parts[0], parts[1]) in benchs:
        results1.append(line)
with open('results/benchmark_results.csv', "w") as f:
    for line in results1:
        f.write(line)
results = open('results/benchmark_results_mean_stddev.csv').readlines()
results1 = []
for line in results:
    parts = line.split(',')
    if parts[0] == 'Benchmark' or (parts[0], parts[1]) in benchs:
        results1.append(line)
len(results), len(results1)
with open('results/benchmark_results_mean_stddev.csv', "w") as f:
    for line in results1:
        f.write(line)

Implement a check for differing benchmark results and metadata

8086e02

siddharth-krishna approved these changes Apr 8, 2025

View reviewed changes

jacek-oet force-pushed the website-nextjs/implement-benchmark-metadata-check branch from 1f5440a to 6328c57 Compare April 8, 2025 17:08

vercel bot had a problem deploying to Preview April 8, 2025 17:08 Failure

jacek-oet requested a review from siddharth-krishna April 8, 2025 17:12

vercel bot deployed to Preview April 8, 2025 17:12 View deployment

Update benchmark_results.csv, Remove success log from validateData fu…

19277ab

…nction

jacek-oet force-pushed the website-nextjs/implement-benchmark-metadata-check branch from 4d3ac70 to 19277ab Compare April 8, 2025 19:44

vercel bot deployed to Preview April 8, 2025 19:45 View deployment

siddharth-krishna reviewed Apr 9, 2025

View reviewed changes

Fix which benchmarks results were removed

e493226

vercel bot deployed to Preview April 9, 2025 08:24 View deployment

siddharth-krishna merged commit 8c63ab1 into main Apr 9, 2025
4 checks passed

siddharth-krishna deleted the website-nextjs/implement-benchmark-metadata-check branch April 9, 2025 08:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Website: Implement a check for differing benchmark results and metadata#144

Website: Implement a check for differing benchmark results and metadata#144
siddharth-krishna merged 3 commits intomainfrom
website-nextjs/implement-benchmark-metadata-check

jacek-oet commented Apr 7, 2025

Uh oh!

vercel bot commented Apr 7, 2025 •

edited

Loading

Uh oh!

jacek-oet commented Apr 7, 2025

Uh oh!

siddharth-krishna commented Apr 8, 2025

Uh oh!

siddharth-krishna left a comment

Uh oh!

siddharth-krishna Apr 9, 2025

Uh oh!

siddharth-krishna Apr 9, 2025

Uh oh!

siddharth-krishna commented Apr 9, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jacek-oet commented Apr 7, 2025

Uh oh!

vercel bot commented Apr 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jacek-oet commented Apr 7, 2025

Uh oh!

siddharth-krishna commented Apr 8, 2025

Uh oh!

siddharth-krishna left a comment

Choose a reason for hiding this comment

Uh oh!

siddharth-krishna Apr 9, 2025

Choose a reason for hiding this comment

Uh oh!

siddharth-krishna Apr 9, 2025

Choose a reason for hiding this comment

Uh oh!

siddharth-krishna commented Apr 9, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

vercel bot commented Apr 7, 2025 •

edited

Loading