Skip to content

Commit cea56d4

Browse files
aristizabal95mhmdk0hasan7n
authored
Execution model & Model/Evaluation status tracking (#631)
* Rename results to executions * Create report fields * Include executions endpoints * Add note regarding existing results endpoints * Fix execution endpoint name * Implement query parameters on main entities * revert django update * WIP turn result entity into execution entity * Implement list query filtering in the CLI * Add list filters to main entities * Move results code to executions. WIP Integrate exec reporting * make results field optional * Don't send reports on tests. Fix evaluation report issue * Allow updating executions * remove owner query for /me/results * Fix benchmark execution flow * rename results tests to executions * Fix rest tests * Fix tests that called results * Fix bugs related to result -> execution change * Fix existing tests * Fix remaining existing tests * Allow for testing executions * Fix style issues * Fix unit tests * Fix server tests * Make test False as default * Allow passing None as execution * fix function name in Rest -> upload_execution * fix associate dataset Test after raising CleanExit when cancellation * fix Rest and tests issues due to wrong merging * fix some typos * update configuration migration * add filtering on client side only for all comms * update executions logic changes are mainly about when can a user rerun or create a new execution, etc... * make finalized True for existing instances * add new flags to commands, filter latest executions * type hints for executions util * fix bug in sending model report logic * fix bug in migrations * update medperf run command * fix some bugs * preserve predictions using timestamps * fix integration tests * add a local outputs folder for metrics container * udpate cli tests * fix server tests * add new server tests * fix postgresql dev utility * rename execution back to result It turns out that renaming a db model is complicated * rename remaining execution changes for consistency * modify migrations to have existing results finalized * update unit test --------- Co-authored-by: mhmdk0 <mohammadkassim54.mk@gmail.com> Co-authored-by: hasan7n <hasankassim7@hotmail.com>
1 parent 8a1f142 commit cea56d4

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

46 files changed

+1229
-624
lines changed

cli/cli_tests.sh

Lines changed: 23 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -439,33 +439,41 @@ checkFailed "run all outstanding models failed"
439439
echo "\n"
440440

441441
##########################################################
442-
echo "======================================================================================"
443-
echo "Run failing container with ignore errors (This SHOULD fail since predictions folder exists)"
444-
echo "======================================================================================"
445-
print_eval medperf run -b $BMK_UID -d $DSET_A_UID -m $FAILING_MODEL_UID -y --ignore-model-errors
446-
checkSucceeded "Container ran successfuly but should fail since predictions folder exists"
442+
echo "====================================================================="
443+
echo "Run failing container with ignore errors"
444+
echo "====================================================================="
445+
print_eval medperf result create -b $BMK_UID -d $DSET_A_UID -m $FAILING_MODEL_UID --ignore-model-errors
446+
checkFailed "Failing container run with ignore errors failed"
447447
##########################################################
448448

449449
echo "\n"
450450

451451
##########################################################
452452
echo "====================================================================="
453-
echo "Run failing container with ignore errors after deleting predictions folder"
453+
echo "Submit failing container's result"
454454
echo "====================================================================="
455-
print_eval rm -rf $MEDPERF_STORAGE/predictions/$SERVER_STORAGE_ID/model-fail/$DSET_A_GENUID
456-
print_eval medperf run -b $BMK_UID -d $DSET_A_UID -m $FAILING_MODEL_UID -y --ignore-model-errors
455+
print_eval medperf result submit -b $BMK_UID -d $DSET_A_UID -m $FAILING_MODEL_UID -y
457456
checkFailed "Failing container run with ignore errors failed"
458457
##########################################################
459458

460459
echo "\n"
461460

462461
##########################################################
463-
echo "====================================="
464-
echo "Running logging model without logging env"
465-
echo "====================================="
466-
print_eval rm -rf $MEDPERF_STORAGE/predictions/$SERVER_STORAGE_ID/model-log-none/$DSET_A_GENUID
467-
print_eval medperf run -b $BMK_UID -d $DSET_A_UID -m $MODEL_LOG_NONE_UID -y
468-
checkFailed "run logging model without logging env failed"
462+
echo "====================================================================="
463+
echo "Rerun (execute+submit). This will error out"
464+
echo "====================================================================="
465+
print_eval medperf run -b $BMK_UID -d $DSET_A_UID -m $FAILING_MODEL_UID --ignore-model-errors -y
466+
checkSucceeded "Rerunning should fail, but it succeeded"
467+
##########################################################
468+
469+
echo "\n"
470+
471+
##########################################################
472+
echo "====================================================================="
473+
echo "Rerun (execute+submit) with --new-result flag. This should work."
474+
echo "====================================================================="
475+
print_eval medperf run -b $BMK_UID -d $DSET_A_UID -m $FAILING_MODEL_UID --ignore-model-errors --new-result -y
476+
checkFailed "Rerunning with --new-result failed"
469477
##########################################################
470478

471479
echo "\n"
@@ -474,8 +482,7 @@ echo "\n"
474482
echo "====================================="
475483
echo "Running logging model with debug logging env"
476484
echo "====================================="
477-
print_eval rm -rf $MEDPERF_STORAGE/predictions/$SERVER_STORAGE_ID/model-log-debug/$DSET_A_GENUID
478-
print_eval medperf --container-loglevel debug run -b $BMK_UID -d $DSET_A_UID -m $MODEL_LOG_DEBUG_UID -y
485+
print_eval medperf --container-loglevel debug run -b $BMK_UID -d $DSET_A_UID -m $MODEL_LOG_DEBUG_UID --new-result -y
479486
checkFailed "run logging model with debug logging env failed"
480487
##########################################################
481488

cli/medperf/cli.py

Lines changed: 16 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -6,9 +6,9 @@
66
from medperf import __version__
77
import medperf.config as config
88
from medperf.decorators import clean_except, add_inline_parameters
9-
import medperf.commands.result.result as result
10-
from medperf.commands.result.create import BenchmarkExecution
11-
from medperf.commands.result.submit import ResultSubmission
9+
from medperf.commands.execution import execution
10+
from medperf.commands.execution.create import BenchmarkExecution
11+
from medperf.commands.execution.submit import ResultSubmission
1212
import medperf.commands.mlcube.mlcube as mlcube
1313
import medperf.commands.dataset.dataset as dataset
1414
import medperf.commands.auth.auth as auth
@@ -28,7 +28,7 @@
2828
app = typer.Typer()
2929
app.add_typer(mlcube.app, name="mlcube", help="Manage mlcubes")
3030
app.add_typer(mlcube.app, name="container", help="Manage containers")
31-
app.add_typer(result.app, name="result", help="Manage results")
31+
app.add_typer(execution.app, name="result", help="Manage results")
3232
app.add_typer(dataset.app, name="dataset", help="Manage datasets")
3333
app.add_typer(benchmark.app, name="benchmark", help="Manage benchmarks")
3434
app.add_typer(association.app, name="association", help="Manage associations")
@@ -65,23 +65,25 @@ def execute(
6565
"--no-cache",
6666
help="Ignore existing results. The experiment then will be rerun",
6767
),
68+
new_result: bool = typer.Option(
69+
False,
70+
"--new-result",
71+
help=(
72+
"Works if the result of the execution was already uploaded."
73+
"This will rerun and create a new record."
74+
),
75+
),
6876
):
6977
"""Runs the benchmark execution step for a given benchmark, prepared dataset and model"""
70-
result = BenchmarkExecution.run(
78+
BenchmarkExecution.run(
7179
benchmark_uid,
7280
data_uid,
7381
[model_uid],
7482
ignore_model_errors=ignore_model_errors,
7583
no_cache=no_cache,
76-
)[0]
77-
if result.id: # TODO: use result.is_registered once PR #338 is merged
78-
config.ui.print( # TODO: msg should be colored yellow
79-
"""An existing registered result for the requested execution has been\n
80-
found. If you wish to submit a new result for the same execution,\n
81-
please run the command again with the --no-cache option.\n"""
82-
)
83-
else:
84-
ResultSubmission.run(result.local_id, approved=approval)
84+
rerun_finalized_executions=new_result,
85+
)
86+
ResultSubmission.run(benchmark_uid, data_uid, model_uid, approved=approval)
8587
config.ui.print("✅ Done!")
8688

8789

cli/medperf/commands/benchmark/benchmark.py

Lines changed: 44 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@
88
from medperf.commands.view import EntityView
99
from medperf.commands.benchmark.submit import SubmitBenchmark
1010
from medperf.commands.benchmark.associate import AssociateBenchmark
11-
from medperf.commands.result.create import BenchmarkExecution
11+
from medperf.commands.execution.create import BenchmarkExecution
1212

1313
app = typer.Typer()
1414

@@ -20,13 +20,48 @@ def list(
2020
False, "--unregistered", help="Get unregistered benchmarks"
2121
),
2222
mine: bool = typer.Option(False, "--mine", help="Get current-user benchmarks"),
23+
name: str = typer.Option(None, "--name", help="Filter by name"),
24+
owner: int = typer.Option(None, "--owner", help="Filter by owner"),
25+
state: str = typer.Option(
26+
None, "--state", help="Filter by state (DEVELOPMENT/OPERATION)"
27+
),
28+
is_valid: bool = typer.Option(
29+
None, "--valid/--invalid", help="Filter by valid status"
30+
),
31+
is_active: bool = typer.Option(
32+
None, "--active/--inactive", help="Filter by active status"
33+
),
34+
data_prep: int = typer.Option(
35+
None,
36+
"-d",
37+
"--data-preparation-container",
38+
help="Filter by Data Preparation Container",
39+
),
2340
):
2441
"""List benchmarks"""
42+
filters = {
43+
"name": name,
44+
"owner": owner,
45+
"state": state,
46+
"is_valid": is_valid,
47+
"is_active": is_active,
48+
"data_preparation_mlcube": data_prep,
49+
}
50+
2551
EntityList.run(
2652
Benchmark,
27-
fields=["UID", "Name", "Description", "State", "Approval Status", "Registered"],
53+
fields=[
54+
"UID",
55+
"Name",
56+
"Description",
57+
"Data Preparation Container",
58+
"State",
59+
"Approval Status",
60+
"Registered",
61+
],
2862
unregistered=unregistered,
2963
mine_only=mine,
64+
**filters,
3065
)
3166

3267

@@ -139,17 +174,23 @@ def run(
139174
"--no-cache",
140175
help="Execute even if results already exist",
141176
),
177+
rerun_finalized: bool = typer.Option(
178+
False,
179+
"--rerun-finalized",
180+
help="Execute even if results have been already uploaded (this will create new records)",
181+
),
142182
):
143183
"""Runs the benchmark execution step for a given benchmark, prepared dataset and model"""
144184
BenchmarkExecution.run(
145185
benchmark_uid,
146186
data_uid,
147187
models_uids=None,
148-
no_cache=no_cache,
149188
models_input_file=file,
150189
ignore_model_errors=ignore_model_errors,
190+
no_cache=no_cache,
151191
show_summary=True,
152192
ignore_failed_experiments=True,
193+
rerun_finalized_executions=rerun_finalized,
153194
)
154195
config.ui.print("✅ Done!")
155196

cli/medperf/commands/compatibility_test/run.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
import logging
22

3-
from medperf.commands.execution import Execution
3+
from medperf.commands.execution.execution_flow import ExecutionFlow
44
from medperf.entities.dataset import Dataset
55
from medperf.entities.benchmark import Benchmark
66
from medperf.entities.report import TestReport
@@ -265,7 +265,7 @@ def execute(self):
265265
Returns:
266266
dict: returns the results of the test execution.
267267
"""
268-
execution_summary = Execution.run(
268+
execution_summary = ExecutionFlow.run(
269269
dataset=self.dataset,
270270
model=self.model_cube,
271271
evaluator=self.evaluator_cube,

cli/medperf/commands/dataset/associate_benchmark.py

Lines changed: 7 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -2,8 +2,8 @@
22
from medperf.entities.dataset import Dataset
33
from medperf.entities.benchmark import Benchmark
44
from medperf.utils import dict_pretty_print, approval_prompt
5-
from medperf.commands.result.create import BenchmarkExecution
6-
from medperf.exceptions import InvalidArgumentError
5+
from medperf.commands.execution.create import BenchmarkExecution
6+
from medperf.exceptions import InvalidArgumentError, CleanExit
77

88

99
class AssociateBenchmarkDataset:
@@ -29,24 +29,25 @@ def run(data_uid: int, benchmark_uid: int, approved=False, no_cache=False):
2929
"The specified dataset wasn't prepared for this benchmark"
3030
)
3131

32-
result = BenchmarkExecution.run(
32+
execution = BenchmarkExecution.run(
3333
benchmark_uid,
3434
data_uid,
3535
[benchmark.reference_model_mlcube],
3636
no_cache=no_cache,
3737
)[0]
38+
results = execution.read_results()
3839
ui.print("These are the results generated by the compatibility test. ")
3940
ui.print("This will be sent along the association request.")
4041
ui.print("They will not be part of the benchmark.")
41-
dict_pretty_print(result.results)
42+
dict_pretty_print(results)
4243

4344
msg = "Please confirm that you would like to associate"
4445
msg += f" the dataset {dset.name} with the benchmark {benchmark.name}."
4546
msg += " [Y/n]"
4647
approved = approved or approval_prompt(msg)
4748
if approved:
4849
ui.print("Generating dataset benchmark association")
49-
metadata = {"test_result": result.results}
50+
metadata = {"test_result": results}
5051
comms.associate_benchmark_dataset(dset.id, benchmark_uid, metadata)
5152
else:
52-
ui.print("Dataset association operation cancelled.")
53+
raise CleanExit("Dataset association operation cancelled.")

cli/medperf/commands/dataset/dataset.py

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,10 @@ def list(
3131
"-m",
3232
help="Get datasets for a given data preparation container",
3333
),
34+
name: str = typer.Option(None, "--name", help="Filter by name"),
35+
owner: int = typer.Option(None, "--owner", help="Filter by owner"),
36+
state: str = typer.Option(None, "--state", help="Filter by state (DEVELOPMENT/OPERATION)"),
37+
is_valid: bool = typer.Option(None, "--valid/--invalid", help="Filter by valid status"),
3438
):
3539
"""List datasets"""
3640
EntityList.run(
@@ -46,6 +50,10 @@ def list(
4650
unregistered=unregistered,
4751
mine_only=mine,
4852
mlcube=mlcube,
53+
name=name,
54+
owner=owner,
55+
state=state,
56+
is_valid=is_valid,
4957
)
5058

5159

0 commit comments

Comments
 (0)