Skip to content

Commit de91260

Browse files
authored
Feature/cli validation (#92)
* fix: remove unused flake8 im pre-commit * refactor: cli validation in configure commands * feat: move validation to pydantic * feat: add pydantic alternatives validation * refactor: drup unused * fix: criteria check * feat: add error handling * fix: power report * fix: alternative command * feat: add validation for cv data dependency * add tests * feat: Add 3-step experiment workflow example * fix: test * fix: cli
1 parent 0a09828 commit de91260

File tree

46 files changed

+2126
-419
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

46 files changed

+2126
-419
lines changed

.pre-commit-config.yaml

Lines changed: 0 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,6 @@
11
# See https://pre-commit.com for more information
22
# See https://pre-commit.com/hooks.html for more hooks
33
repos:
4-
- repo: https://github.com/pycqa/flake8
5-
rev: "7.1.1"
6-
hooks:
7-
- id: flake8
8-
additional_dependencies: [Flake8-pyproject]
9-
# stages: [push]
104

115
- repo: https://github.com/pre-commit/mirrors-mypy
126
rev: "v1.13.0"

experiment_example/README.md

Lines changed: 136 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,136 @@
1+
# Quick-Start Template: A 3-Step Experimental Workflow
2+
3+
This directory provides a set of template configuration files for the `pysatl-experiment` framework. It is designed to serve as a starting point for conducting a robust, three-step comparison of statistical goodness-of-fit tests.
4+
5+
The goal of this workflow is to help you make a data-driven decision about which statistical test is best suited for your specific hypothesis and potential alternatives.
6+
7+
## The 3-Step Workflow
8+
9+
This template follows a complete research cycle. Each step answers a different question, and they are designed to be run in sequence.
10+
11+
---
12+
13+
### Step 1: Calibration
14+
File: `1_calculate_critical_values.json`
15+
16+
**The Question:** What are the correct decision thresholds for my chosen tests?
17+
18+
**The Purpose:** Before you can use any statistical test, you must establish its baseline critical values. These values depend on your null hypothesis (the distribution you are testing for), the sample sizes, and the desired significance levels (alpha). This step calibrates your "measurement tools."
19+
20+
**The Command:**
21+
```bash
22+
poetry run experiment build-and-run 1_calculate_critical_values
23+
```
24+
---
25+
### Step 2: Evaluation
26+
File: `2_evaluate_power.json`
27+
28+
**The Question:** How effective are these tests at detecting the specific deviations I care about?
29+
30+
**The Purpose:** This is the core scientific evaluation. You measure the **statistical power** of each test against one or more alternative hypotheses. A test with high power is very good at correctly rejecting the null hypothesis when it is indeed false. This step tells you which test is most effective for your specific problem.
31+
32+
*This experiment requires the critical values generated in Step 1.*
33+
34+
**The Command:**
35+
```bash
36+
poetry run experiment build-and-run 2_evaluate_power
37+
```
38+
39+
---
40+
41+
### Step 3: Performance Analysis
42+
File: `3_measure_time_complexity.json`
43+
44+
**The Question:** What is the computational cost of using each test?
45+
46+
**The Purpose:** This is the practical, engineering evaluation. If two tests show similar power, the faster one may be preferable, especially in data-intensive applications. This experiment measures the execution time of each test, helping you understand the trade-offs between effectiveness and performance.
47+
48+
**The Command:**
49+
```bash
50+
poetry run experiment build-and-run 3_measure_time_complexity
51+
```
52+
53+
---
54+
55+
## How to Adapt This Template for Your Own Research
56+
57+
To use these files for your own experiment, copy this directory and modify the JSON configurations. The key is to ensure your parameters are consistent across the files.
58+
#### 1. Define Your Null Hypothesis
59+
60+
In all three files (`1_...`, `2_...`, `3_...`), change the `hypothesis` object to match the distribution you want to test for.
61+
62+
**Example: Changing from `Normal` to `Weibull`**
63+
```json
64+
"hypothesis": "normal"
65+
to
66+
"hypothesis": "weibull"
67+
```
68+
69+
#### 2. Choose the Criteria to Compare
70+
71+
In all three files, update the `criteria` list to include the tests you want to evaluate.
72+
73+
**Example: Comparing `Liliefos` and `Chi-Squared`**
74+
```json
75+
"criteria": [
76+
{
77+
"criterion_code": "LILLIE",
78+
"parameters": []
79+
},
80+
{
81+
"criterion_code": "CHI2",
82+
"parameters": []
83+
}
84+
],
85+
```
86+
#### 3. Define the Alternative Hypothesis
87+
88+
In `2_evaluate_power.json`, modify the `alternatives` array. This is the specific deviation you want your tests to be able to detect.
89+
90+
**Example: Checking if tests for `Weibull` can detect an `Exponential` distribution**
91+
```json
92+
"alternatives": [
93+
{
94+
"generator_name": "EXPONENTIALGENERATOR",
95+
"parameters": [
96+
0.5,
97+
]
98+
}
99+
]
100+
```
101+
#### 4. Adjust Experiment Parameters
102+
103+
You can also change `sample_sizes`, `monte_carlo_count`, and `significance_levels` to fit the scope of your research. Just ensure the `sample_sizes` and `significance_levels` are consistent between the `critical_value` and `power` experiments.
104+
105+
By following this structured, three-step approach, you can systematically evaluate and select the best statistical test for any given problem.
106+
107+
---
108+
109+
## How to Run This Example Workflow
110+
111+
The `pysatl-experiment` command-line tool is designed to look for experiment configurations in a dedicated `.experiment` directory at the root of the project.
112+
113+
Therefore, to run these examples, you must first copy the JSON template files into that location.
114+
115+
**1. Copy the configuration files:**
116+
117+
Firstly you need create files
118+
```bash
119+
poerty run experiment create <1_calculate_critical_values>
120+
```
121+
122+
**2. Run the experiments in sequence:**
123+
124+
Once the files are in the `.experiment` directory, you can execute the workflow steps in order. Make sure you are in the project's root directory.
125+
### First, run the calibration step
126+
```
127+
poetry run experiment build-and-run 1_calculate_critical_values
128+
```
129+
### Second, run the power evaluation
130+
```
131+
poetry run experiment build-and-run 2_evaluate_power
132+
```
133+
### Finally, run the performance analysis
134+
```
135+
poetry run experiment build-and-run 3_measure_time_complexity
136+
```
Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
{
2+
"config": {
3+
"name": "1_calculate_critical_values",
4+
"experiment_type": "critical_value",
5+
"hypothesis": "normal",
6+
"criteria": [
7+
{
8+
"criterion_code": "KS",
9+
"parameters": []
10+
},
11+
{
12+
"criterion_code": "AD",
13+
"parameters": []
14+
}
15+
],
16+
"executor_type": "standard",
17+
"generator_type": "standard",
18+
"report_builder_type": "standard",
19+
"report_mode": "with-chart",
20+
"run_mode": "reuse",
21+
"monte_carlo_count": 1000,
22+
"sample_sizes": [
23+
200,
24+
400,
25+
600,
26+
800,
27+
1000
28+
],
29+
"significance_levels": [
30+
0.01,
31+
0.05,
32+
0.1
33+
],
34+
"storage_connection": "/home/moveton/projects/prog/pysatl-experiment/.storage/result.sqlite"
35+
}
36+
}
Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,45 @@
1+
{
2+
"config": {
3+
"name": "2_evaluate_power",
4+
"experiment_type": "power",
5+
"hypothesis": "normal",
6+
"alternatives": [
7+
{
8+
"generator_name": "NORMALGENERATOR",
9+
"parameters": [
10+
0.5,
11+
1
12+
]
13+
}
14+
],
15+
"criteria": [
16+
{
17+
"criterion_code": "KS",
18+
"parameters": []
19+
},
20+
{
21+
"criterion_code": "AD",
22+
"parameters": []
23+
}
24+
],
25+
"executor_type": "standard",
26+
"generator_type": "standard",
27+
"report_builder_type": "standard",
28+
"report_mode": "with-chart",
29+
"run_mode": "reuse",
30+
"monte_carlo_count": 1000,
31+
"sample_sizes": [
32+
200,
33+
400,
34+
600,
35+
800,
36+
1000
37+
],
38+
"significance_levels": [
39+
0.01,
40+
0.05,
41+
0.1
42+
],
43+
"storage_connection": "/home/moveton/projects/prog/pysatl-experiment/.storage/result.sqlite"
44+
}
45+
}
Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
{
2+
"config": {
3+
"name": "3_measure_time_complexity",
4+
"experiment_type": "time_complexity",
5+
"hypothesis": "normal",
6+
"criteria": [
7+
{
8+
"criterion_code": "KS",
9+
"parameters": []
10+
},
11+
{
12+
"criterion_code": "AD",
13+
"parameters": []
14+
}
15+
],
16+
"executor_type": "standard",
17+
"generator_type": "standard",
18+
"report_builder_type": "standard",
19+
"report_mode": "with-chart",
20+
"run_mode": "reuse",
21+
"monte_carlo_count": 1000,
22+
"sample_sizes": [
23+
200,
24+
400,
25+
600,
26+
800,
27+
1000
28+
],
29+
"significance_levels": [
30+
0.01,
31+
0.05,
32+
0.1
33+
],
34+
"storage_connection": "/home/moveton/projects/prog/pysatl-experiment/.storage/result.sqlite"
35+
}
36+
}
150 KB
Binary file not shown.
122 KB
Binary file not shown.
114 KB
Binary file not shown.

pyproject.toml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,7 @@ rich = "==13.9.4"
2525
click = ">=8.2.1"
2626
dacite = "==1.9.2"
2727
line_profiler = "5.0.0"
28+
pydantic = "^2.11.9"
2829
pysatl-criterion = {path = "./pysatl_criterion"}
2930

3031
[tool.poetry.group.dev.dependencies]

pysatl_experiment/cli/commands/build_and_run/build_and_run.py

Lines changed: 8 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,6 @@
11
from click import BadParameter, argument, command
22

3-
from pysatl_experiment.cli.commands.common.common import get_experiment_config, read_experiment_data
4-
from pysatl_experiment.configuration.experiment_config.experiment_config import ExperimentConfig
3+
from pysatl_experiment.cli.commands.common.common import read_experiment_data
54
from pysatl_experiment.configuration.experiment_data.experiment_data import ExperimentData
65
from pysatl_experiment.experiment_new.experiment.experiment import Experiment
76
from pysatl_experiment.experiment_new.experiment_steps.experiment_steps import ExperimentSteps
@@ -20,42 +19,34 @@ def build_and_run(name: str) -> None:
2019
2120
:param name: name of the experiment.
2221
"""
23-
2422
experiment_exists = if_experiment_exists(name)
2523
if not experiment_exists:
2624
raise BadParameter(f"Experiment with name {name} does not exists.")
2725

2826
experiment_data_dict = read_experiment_data(name)
29-
experiment_config = get_experiment_config(experiment_data_dict)
30-
3127
experiment_data = validate_build_and_run(experiment_data_dict)
32-
33-
experiment_type = experiment_config["experiment_type"]
34-
35-
experiment_steps = _build_experiment(experiment_data, experiment_type)
28+
experiment_steps = _build_experiment(experiment_data)
3629

3730
experiment = Experiment(experiment_steps)
38-
3931
experiment.run_experiment()
4032

4133

42-
def _build_experiment(experiment_data: ExperimentData[ExperimentConfig], experiment_type: str) -> ExperimentSteps:
34+
def _build_experiment(experiment_data: ExperimentData) -> ExperimentSteps:
4335
"""
44-
Build experiment.
45-
46-
:param experiment_data: experiment data.
47-
:param experiment_type: experiment type.
36+
Build experiment from validated ExperimentData object.
4837
38+
:param experiment_data: validated experiment data.
4939
:return: experiment steps.
5040
"""
51-
5241
experiment_type_to_factory = {
5342
"power": PowerExperimentFactory,
5443
"critical_value": CriticalValueExperimentFactory,
5544
"time_complexity": TimeComplexityExperimentFactory,
5645
}
5746

58-
experiment_factory = experiment_type_to_factory[experiment_type]
47+
experiment_type_str = experiment_data.config.experiment_type.value
48+
49+
experiment_factory = experiment_type_to_factory[experiment_type_str]
5950

6051
experiment_steps = experiment_factory(experiment_data).create_experiment_steps()
6152

0 commit comments

Comments
 (0)