Skip to content

Commit 1f4508f

Browse files
authored
Merge pull request #33 from gridfm/swap_c0_c2
swapped c0 and c2
2 parents 61ae3e5 + 04c8160 commit 1f4508f

16 files changed

+93
-85
lines changed

README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -182,9 +182,9 @@ settings:
182182
large_chunk_size: 1000 # Number of load scenarios processed before saving
183183
overwrite: true # If true, overwrites existing files, if false, appends to files
184184
mode: "pf" # Mode of the script; options: pf, opf. pf: power flow data where one or more operating limits – the inequality constraints defined in OPF, e.g., voltage magnitude or branch limits – may be violated. opf: generates datapoints for training OPF solvers, with cost-optimal dispatches that satisfy all operating limits (OPF-feasible)
185-
include_dc_res: true # If true, also stores the results of dc power flow (in addition to the results AC power flow). does not work with mode "opf"
185+
include_dc_res: true # If true, also stores the results of dc power flow or dc optimal power flow
186186
enable_solver_logs: true # If true, write OPF/PF logs to {data_dir}/solver_log; PF fast and DCPF fast do not log.
187-
pf_fast: true # Whether to use fast PF solver by default (compute_ac_pf from powermodels.jl); if false, uses Ipopt-based PF. Some networks e.g. case10000_goc do not work with pf_fast: true. pf_fast is faster and more accurate than the Ipopt-based PF.
187+
pf_fast: true # Whether to use fast PF solver by default (compute_ac_pf from powermodels.jl); if false, uses Ipopt-based PF. Some networks (typically large ones e.g. case10000_goc) do not work with pf_fast: true. pf_fast is faster and more accurate than the Ipopt-based PF.
188188
dcpf_fast: true # Whether to use fast DCPF solver by default (compute_dc_pf from PowerModels.jl)
189189
max_iter: 200 # Max iterations for Ipopt-based solvers
190190
```

docs/components/cli.md

Lines changed: 28 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -30,21 +30,47 @@ gridfm-datakit validate path/to/data/directory [--n-partitions N] [--sn-mva 100]
3030

3131
**Arguments:**
3232
- `data_path`: Path to directory containing generated CSV files
33-
- `--n-partitions N`: Number of partitions to sample for validation (default: 100). Use 0 to validate all partitions.
33+
- `--n-partitions N`: Number of partitions (of 200 scenarios) to sample for validation (default: 100). Use 0 to validate all partitions.
3434
- `--sn-mva`: Base MVA used to scale power quantities (default: 100).
3535

3636
**Examples:**
3737
```bash
3838
# Validate with default sampling (100 partitions)
3939
gridfm-datakit validate ./data_out/case24_ieee_rts/raw
4040

41-
# Validate with custom partition sampling
41+
# Validate custom number of partitions
4242
gridfm-datakit validate ./data_out/case24_ieee_rts/raw --n-partitions 50
4343

4444
# Validate all partitions (slower but complete)
4545
gridfm-datakit validate ./data_out/case24_ieee_rts/raw --n-partitions 0
4646
```
4747

48+
The validation command performs the following checks:
49+
50+
#### Y-Bus Consistency
51+
- Consistency of bus admittance matrix with branch admittance data
52+
- Y-bus matrix structure validation
53+
54+
#### Branch Constraints
55+
- Deactivated lines have zero power flows and admittances
56+
- Computed vs stored power flow consistency
57+
- Branch loading limits (OPF mode only)
58+
59+
#### Generator Constraints
60+
- Deactivated generators have zero power output
61+
- Generator power limits validation
62+
- Reactive power limits (OPF mode only)
63+
64+
#### Power Balance
65+
- Bus generation consistency between bus_data and gen_data
66+
- Power Balance
67+
68+
#### Data Integrity
69+
- Scenario indexing consistency across all files
70+
- Bus indexing consistency
71+
- Data completeness and missing value checks
72+
73+
4874
### Stats
4975

5076
Compute and display statistics from generated power flow data:
@@ -90,34 +116,3 @@ gridfm-datakit plots ./data_out/case24_ieee_rts/raw --sn-mva 100
90116
```
91117

92118
This command reads `bus_data.parquet`, normalizes power columns by `sn_mva`, and writes violin plots named `distribution_{feature_name}.png` to the output directory for quick visualization of feature distributions.
93-
94-
## Validation Checks
95-
96-
The validation command performs the following checks:
97-
98-
### Y-Bus Consistency
99-
- Consistency of bus admittance matrix with branch admittance data
100-
- Y-bus matrix structure validation
101-
102-
### Branch Constraints
103-
- Deactivated lines have zero power flows and admittances
104-
- Computed vs stored power flow consistency
105-
- Branch loading limits (OPF mode only)
106-
107-
### Generator Constraints
108-
- Deactivated generators have zero power output
109-
- Generator power limits validation
110-
- Reactive power limits (OPF mode only)
111-
112-
### Power Balance
113-
- Bus generation consistency between bus_data and gen_data
114-
- Power Balance
115-
116-
### Data Integrity
117-
- Scenario indexing consistency across all files
118-
- Bus indexing consistency
119-
- Data completeness and missing value checks
120-
121-
### `main`
122-
123-
::: gridfm_datakit.cli.main
Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,10 @@
11
# Admittance Perturbations
22

33
## Overview
4-
Admittance perturbations introduce changes to line admittance values by applying random scaling factors to the resistance ($R$) and reactance ($X$) parameters of grid lines. Admittance ($Y$) is related to impedance ($Z$) through $Y=1/Z$, and the impedance, in turn, is related to resistance and reactance through $Z=R+jX$. This results in more variance and diversity in power flow solutions which is beneficial for training ML models to improve generalization. Admittance perturbations are applied to the existing topology and generation perturbations.
4+
Admittance perturbations introduce changes to branch admittance values by applying random scaling factors to the resistance ($R$) and reactance ($X$) parameters of grid branches. This results in more variance and diversity in power flow solutions which is beneficial for training ML models to improve generalization.
55

66
The module provides two options for admittance perturbation strategies:
77

8-
- `NoAdmittancePerturbationGenerator` yields the original example produced by the generation perturbation generator without any additional changes in line admittances.
8+
- `NoAdmittancePerturbationGenerator` yields the original example without any additional changes in branch admittances.
99

10-
- `PerturbAdmittanceGenerator` applies a scaling factor to all resistance and reactance values of network lines. The scaling factor is sampled from a uniform distribution with a range given by `[max(0, 1-sigma), 1+sigma)`, where `sigma` is a user-defined adjustable parameter.
10+
- `PerturbAdmittanceGenerator` applies a scaling factor to all resistance and reactance values of network branches. The scaling factor is sampled from a uniform distribution with a range given by `[max(0, 1-sigma), 1+sigma)`, where `sigma` is a user-defined adjustable parameter.

docs/manual/generation_perturbations.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ Generation perturbations introduce random changes to the cost functions of gener
55

66
The module provides three options for generation perturbation strategies:
77

8-
- `NoGenPerturbationGenerator` yields the original example produced by the topology perturbation generator without any additional changes in generation cost.
8+
- `NoGenPerturbationGenerator` yields the original example without any additional changes in generation cost.
99

1010
- `PermuteGenCostGenerator` randomly permutes the generator cost coefficients across and among generator elements.
1111

docs/manual/getting_started.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -91,17 +91,17 @@ The `mode` parameter controls how the power flow scenarios are generated and val
9191
- **Constraints**: Since the topology perturbations are performed after solving OPF, the inequality constraints of OPF (e.g. branch loading, voltage magnitude at PQ buses, generator bounds on reactive power, etc) might be violated.
9292
- **Use Case**: Training data for power flow, contingency analysis, etc
9393
- **Performance**: Faster as it avoids re-solving OPF for each perturbed scenario
94-
- **PF Solver Choice**: Controlled by `settings.pf_fast`. If `true`, uses the fast `compute_ac_pf` path. If `false`, uses the Ipopt-based AC PF for higher fidelity at the cost of speed.
94+
- **PF Solver Choice**: Controlled by `settings.pf_fast`. If `true`, uses the fast `compute_ac_pf` path. If `false`, uses the Ipopt-based AC PF which is slower for smaller grids but has better convergence properties for large grids.
9595

9696
## Data Validation
9797

9898
The generated data can be validated using the CLI validation command:
9999

100100
```bash
101-
# Validate with default sampling (100 partitions)
101+
# Validate with default sampling (100 partitions of 200 scenarios)
102102
gridfm-datakit validate ./data_out/case24_ieee_rts/raw
103103

104-
# Validate with custom partition sampling
104+
# Validate with custom number of partitions
105105
gridfm-datakit validate ./data_out/case24_ieee_rts/raw --n-partitions 50
106106

107107
# Validate all partitions (slower but complete)

docs/manual/outputs.md

Lines changed: 24 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -31,10 +31,13 @@ Metadata file containing the total number of scenarios (used for efficient parti
3131

3232
### Network Data Files
3333

34+
**Note**: All network data files are saved as partitioned parquet directories. Each file includes a `scenario_partition` column used for partitioning, which groups scenarios into partitions (default: 200 scenarios per partition).
35+
3436
#### `bus_data.parquet`
35-
Bus-level features for each processed scenario. Columns (BUS_COLUMNS):
37+
Bus-level features for each processed scenario. Columns:
3638

37-
- **scenario**: Index of the scenario (unique identifier of the power flow case)
39+
- **scenario**: Global scenario index (unique identifier)
40+
- **load_scenario_idx**: Index of the load scenario
3841
- **bus**: Index of the bus
3942
- **Pd**: Active power demand at the bus (MW)
4043
- **Qd**: Reactive power demand at the bus (MVAr)
@@ -56,9 +59,10 @@ If `settings.include_dc_res=True`, also includes DC power flow columns (DC_BUS_C
5659
- **Pg_dc**: DC active power generation at the bus (MW)
5760

5861
#### `gen_data.parquet`
59-
Generator features per scenario. Columns (GEN_COLUMNS):
62+
Generator features per scenario. Columns:
6063

61-
- **scenario**: Index of the scenario
64+
- **scenario**: Global scenario index (unique identifier)
65+
- **load_scenario_idx**: Index of the load scenario
6266
- **idx**: Generator row index (0-based)
6367
- **bus**: Bus index where the generator is connected
6468
- **p_mw**: Active power output (MW)
@@ -77,9 +81,10 @@ If `settings.include_dc_res=True`, also includes DC generator column (DC_GEN_COL
7781
- **p_mw_dc**: Active power from DC solution (MW)
7882

7983
#### `branch_data.parquet`
80-
Branch features per scenario. Columns (BRANCH_COLUMNS):
84+
Branch features per scenario. Columns:
8185

82-
- **scenario**: Index of the scenario
86+
- **scenario**: Global scenario index (unique identifier)
87+
- **load_scenario_idx**: Index of the load scenario
8388
- **idx**: Branch row index (0-based)
8489
- **from_bus**: Index of the source bus
8590
- **to_bus**: Index of the destination bus
@@ -110,9 +115,10 @@ If `settings.include_dc_res=True`, also includes DC branch columns (DC_BRANCH_CO
110115
- **pt_dc**: DC active power flow from destination to source (MW)
111116

112117
#### `y_bus_data.parquet`
113-
Nonzero Y-bus entries per scenario with columns:
118+
Nonzero Y-bus entries per scenario. Columns:
114119

115-
- **scenario**: Index of the scenario
120+
- **scenario**: Global scenario index (unique identifier)
121+
- **load_scenario_idx**: Index of the load scenario
116122
- **index1**: Row index in the Y-bus matrix
117123
- **index2**: Column index in the Y-bus matrix
118124
- **G**: Conductance value (p.u.)
@@ -121,7 +127,14 @@ Nonzero Y-bus entries per scenario with columns:
121127
### Runtime Data Files
122128

123129
#### `runtime_data.parquet`
124-
Runtime data for each scenario (AC and DC solver execution times).
130+
Runtime data for each scenario. Columns:
131+
132+
- **scenario**: Global scenario index (unique identifier)
133+
- **load_scenario_idx**: Index of the load scenario
134+
- **ac**: AC solver execution time (seconds)
135+
136+
If `settings.include_dc_res=True`, also includes DC runtime column (DC_RUNTIME_COLUMNS):
137+
- **dc**: DC solver execution time (seconds)
125138

126139
### Statistics Files
127140

@@ -134,8 +147,8 @@ Aggregated statistics collected during generation (if `settings.no_stats=False`)
134147
- Maximum loading values
135148
- Other network performance metrics
136149

137-
#### `stats_plot.html`
138-
HTML dashboard of the aggregated statistics (if `settings.no_stats=False`).
150+
#### `stats_plot.png`
151+
Visualization of the aggregated statistics (if `settings.no_stats=False`).
139152

140153
### Feature Visualization
141154

docs/manual/topology_perturbations.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
## Overview
44

5-
Topology perturbations generate variations of the original network by altering its structure. These variations simulate contingencies and component failures, and are useful for robustness testing, contingency analysis, and training ML models on diverse grid conditions.
5+
Topology perturbations generate variations of the original network by altering its topology. These variations simulate contingencies and component failures, and are useful for robustness testing, contingency analysis, and training ML models on diverse grid conditions.
66

77
The module provides three topology perturbation strategies:
88

gridfm_datakit/process/process_network.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -741,9 +741,9 @@ def pf_post_processing(
741741
X_gen[:, 6] = net.gens[:, PMAX]
742742
X_gen[:, 7] = net.gens[:, QMIN]
743743
X_gen[:, 8] = net.gens[:, QMAX]
744-
X_gen[:, 9] = net.gencosts[:, COST]
744+
X_gen[:, 9] = net.gencosts[:, COST + 2]
745745
X_gen[:, 10] = net.gencosts[:, COST + 1]
746-
X_gen[:, 11] = net.gencosts[:, COST + 2]
746+
X_gen[:, 11] = net.gencosts[:, COST]
747747
X_gen[net.idx_gens_in_service, 12] = 1
748748

749749
# slack gen (can be any generator connected to the ref node)

scripts/compare_parquet_files.py

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -37,7 +37,7 @@
3737
generation_perturbation:
3838
type: "none" # Type of generation perturbation; options: cost_permutation, cost_perturbation, none
3939
# WARNING: the following parameter is only used if type is "cost_permutation"
40-
sigma: 1.0 # Size of range use for sampling scaling factor
40+
sigma: 1.0 # Size of range used for sampling scaling factor
4141
4242
admittance_perturbation:
4343
type: "none" # Type of admittance perturbation; options: random_perturbation, none
@@ -49,10 +49,10 @@
4949
data_dir: "./testdelll" # Directory to save generated data relative to the project root
5050
large_chunk_size: 1000 # Number of load scenarios processed before saving
5151
overwrite: true # If true, overwrites existing files, if false, appends to files
52-
mode: "pf" # Mode of the script; options: pf, opf. pf: power flow data where one or more operating limits – the inequality constraints defined in OPF, e.g., voltage magnitude or branch limits – may be violated. opf: datapoints for training OPF solvers, with cost-optimal dispatches that satisfy all operating limits (OPF-feasible)
53-
include_dc_res: true # If true, also stores the results of dc power flow (in addition to the results AC power flow). does not work with mode "opf"
52+
mode: "pf" # Mode of the script; options: pf, opf. pf: power flow data where one or more operating limits – the inequality constraints defined in OPF, e.g., voltage magnitude or branch limits – may be violated. opf: generates datapoints for training OPF solvers, with cost-optimal dispatches that satisfy all operating limits (OPF-feasible)
53+
include_dc_res: true # If true, also stores the results of dc power flow or dc optimal power flow
5454
enable_solver_logs: true # If true, write OPF/PF logs to {data_dir}/solver_log; PF fast and DCPF fast do not log.
55-
pf_fast: true # Whether to use fast PF solver by default (compute_ac_pf from powermodels.jl); if false, uses Ipopt-based PF. Some networks e.g. case10000_goc do not work with pf_fast: true. pf_fast is faster and more accurate than the Ipopt-based PF.
55+
pf_fast: true # Whether to use fast PF solver by default (compute_ac_pf from powermodels.jl); if false, uses Ipopt-based PF. Some networks (typically large ones e.g. case10000_goc) do not work with pf_fast: true. pf_fast is faster and more accurate than the Ipopt-based PF.
5656
dcpf_fast: true # Whether to use fast DCPF solver by default (compute_dc_pf from PowerModels.jl)
5757
max_iter: 200 # Max iterations for Ipopt-based solvers
5858

scripts/config/Texas2k_case1_2016summerpeak.yaml

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@ topology_perturbation:
2727
generation_perturbation:
2828
type: "cost_permutation" # Type of generation perturbation; options: cost_permutation, cost_perturbation, none
2929
# WARNING: the following parameter is only used if type is "cost_permutation"
30-
sigma: 1.0 # Size of range use for sampling scaling factor
30+
sigma: 1.0 # Size of range used for sampling scaling factor
3131

3232
admittance_perturbation:
3333
type: "random_perturbation" # Type of admittance perturbation; options: random_perturbation, none
@@ -39,9 +39,9 @@ settings:
3939
data_dir: "./baseline_perturbations" # Directory to save generated data relative to the project root
4040
large_chunk_size: 10000 # Number of load scenarios processed before saving
4141
overwrite: true # If true, overwrites existing files, if false, appends to files
42-
mode: "pf" # Mode of the script; options: pf, opf. pf: power flow data where one or more operating limits – the inequality constraints defined in OPF, e.g., voltage magnitude or branch limits – may be violated. opf: datapoints for training OPF solvers, with cost-optimal dispatches that satisfy all operating limits (OPF-feasible)
43-
include_dc_res: true # If true, also stores the results of dc power flow (in addition to the results AC power flow). does not work with mode "opf"
42+
mode: "pf" # Mode of the script; options: pf, opf. pf: power flow data where one or more operating limits – the inequality constraints defined in OPF, e.g., voltage magnitude or branch limits – may be violated. opf: generates datapoints for training OPF solvers, with cost-optimal dispatches that satisfy all operating limits (OPF-feasible)
43+
include_dc_res: true # If true, also stores the results of dc power flow or dc optimal power flow
4444
enable_solver_logs: false # If true, write OPF/PF logs to {data_dir}/solver_log; PF fast and DCPF fast do not log.
45-
pf_fast: true # Whether to use fast PF solver by default (compute_ac_pf from powermodels.jl); if false, uses Ipopt-based PF. Some networks e.g. case10000_goc do not work with pf_fast: true. pf_fast is faster and more accurate than the Ipopt-based PF.
45+
pf_fast: true # Whether to use fast PF solver by default (compute_ac_pf from powermodels.jl); if false, uses Ipopt-based PF. Some networks (typically large ones e.g. case10000_goc) do not work with pf_fast: true. pf_fast is faster and more accurate than the Ipopt-based PF.
4646
dcpf_fast: true # Whether to use fast DCPF solver by default (compute_dc_pf from PowerModels.jl)
4747
max_iter: 200 # Max iterations for Ipopt-based solvers

0 commit comments

Comments
 (0)