Skip to content

Website: Implement a check for differing benchmark results and metadata#144

Merged
siddharth-krishna merged 3 commits intomainfrom
website-nextjs/implement-benchmark-metadata-check
Apr 9, 2025
Merged

Website: Implement a check for differing benchmark results and metadata#144
siddharth-krishna merged 3 commits intomainfrom
website-nextjs/implement-benchmark-metadata-check

Conversation

@jacek-oet
Copy link
Member

No description provided.

@vercel
Copy link

vercel bot commented Apr 7, 2025

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
solver-benchmark ✅ Ready (Inspect) Visit Preview 💬 Add feedback Apr 9, 2025 8:24am

@jacek-oet
Copy link
Member Author

@siddharth-krishna CI is failing now because there are some benchmarks that appear in benchmark-result.csv but not in result/meta-data.yaml.

Validation failed for benchmark: pypsa-eur-sec with size: 3-24h
Validation failed for benchmark: pypsa-eur-sec with size: 4-24h
Validation failed for benchmark: pypsa-eur-sec with size: 2-12h
Validation failed for benchmark: pypsa-eur-sec with size: 5-24h
Validation failed for benchmark: pypsa-eur-sec with size: 7-24h
Validation failed for benchmark: pypsa-eur-sec with size: 8-24h
Validation failed for benchmark: pypsa-eur-sec with size: 3-12h
Validation failed for benchmark: pypsa-eur-sec with size: 9-24h
Validation failed for benchmark: pypsa-eur-sec with size: 10-24h
Validation failed for benchmark: pypsa-eur-sec with size: 4-12h
Validation failed for benchmark: pypsa-eur-elec-trex with size: 2-24h
Validation failed for benchmark: pypsa-eur-elec-trex with size: 3-24h
Validation failed for benchmark: pypsa-eur-elec-trex with size: 4-24h
Validation failed for benchmark: pypsa-eur-elec-trex with size: 2-12h
Validation failed for benchmark: pypsa-eur-elec-trex with size: 5-24h
Validation failed for benchmark: pypsa-eur-elec-trex with size: 6-24h
Validation failed for benchmark: pypsa-eur-elec-trex with size: 7-24h
Validation failed for benchmark: pypsa-eur-elec-trex with size: 8-24h
Validation failed for benchmark: pypsa-eur-elec-trex with size: 4-12h
Validation failed for benchmark: pypsa-eur-elec-trex with size: 10-24h
Validation failed for benchmark: pypsa-eur-elec-trex with size: 5-12h
Validation failed for benchmark: pypsa-eur-elec-trex with size: 9-24h
....

I think meta-data.yaml should be updated, or we should remove these missing benchmarks from benchmark-result.csv.

@siddharth-krishna
Copy link
Member

Ah yes, nice catch. These are old results (which we are using just for development), I will do a new benchmark run this week that will only run things in the metadata.yaml file so you're right, we should remove the extra ones from the results CSV. Could you do that please? Thanks

@jacek-oet jacek-oet force-pushed the website-nextjs/implement-benchmark-metadata-check branch from 4d3ac70 to 19277ab Compare April 8, 2025 19:44
Copy link
Member

@siddharth-krishna siddharth-krishna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry Jacek, it looks like there are still some mistakes in which benchmarks were removed. If you don't mind, let me do this quickly and push to your branch.

Sienna_modified_RTS_GMLC_DA_sys_NetPTDF_Horizon12_Day29,1-1h,glpk,5.0,2020,TO,Timeout,600,132.948,,,
Sienna_modified_RTS_GMLC_DA_sys_NetPTDF_Horizon12_Day314,1-1h,glpk,5.0,2020,TO,Timeout,600,132.74,,,
Sienna_modified_RTS_GMLC_DA_sys_NetPTDF_Horizon12_Day332,1-1h,glpk,5.0,2020,TO,Timeout,600,135.172,,,
Sienna_modified_RTS_GMLC_DA_sys_NetPTDF_Horizon12_Day332,1-1h,glpk,5.0,2020,unknown,unknown,1.9189174175262451,150.836,0.0,,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm I didn't expect any results to be modified -- was this intentional? If not, could we revert the modifications and ensure that the diff only deletes rows corresponding to sizes that were removed from results/metadata.yaml? Thanks

Sienna_modified_RTS_GMLC_DA_sys_NetTransport_Horizon48_Day332,1-1h,highs,1.8.1,2024,TO,Timeout,600,625.368,,,
Sienna_modified_RTS_GMLC_DA_sys_NetTransport_Horizon48_Day332,1-1h,scip,9.1.1,2024,TO,Timeout,600,961.792,,,
genx-1_three_zones,3-1h,highs,1.8.1,2024,TO,Timeout,600,1784.696,,,
genx-1_three_zones,3-1h,scip,9.1.1,2024,TO,Timeout,600,1320.512,,,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels like too many rows have been deleted...

@siddharth-krishna
Copy link
Member

For the record, I used this script to figure out which benchmark results to keep:

import yaml
meta = yaml.safe_load(open("results/metadata.yaml"))
benchs = set()
for b, d in meta['benchmarks'].items():
    for s in d['Sizes']:
        benchs.add((b, s['Name']))

results = open('results/benchmark_results.csv').readlines()
results1 = []
for line in results:
    parts = line.split(',')
    if parts[0] == 'Benchmark' or (parts[0], parts[1]) in benchs:
        results1.append(line)
with open('results/benchmark_results.csv', "w") as f:
    for line in results1:
        f.write(line)
results = open('results/benchmark_results_mean_stddev.csv').readlines()
results1 = []
for line in results:
    parts = line.split(',')
    if parts[0] == 'Benchmark' or (parts[0], parts[1]) in benchs:
        results1.append(line)
len(results), len(results1)
with open('results/benchmark_results_mean_stddev.csv', "w") as f:
    for line in results1:
        f.write(line)

@siddharth-krishna siddharth-krishna merged commit 8c63ab1 into main Apr 9, 2025
4 checks passed
@siddharth-krishna siddharth-krishna deleted the website-nextjs/implement-benchmark-metadata-check branch April 9, 2025 08:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants