Skip to content

Commit 4de5c0d

Browse files
committed
add dag
1 parent 8971455 commit 4de5c0d

18 files changed

+1499
-19
lines changed

DAG_README.md

Lines changed: 172 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,172 @@
1+
# Directed Acyclic Graphs (DAGs) for struct Models
2+
3+
This document describes the new DAG functionality added to the struct package, which allows you to create and execute directed acyclic graphs of struct models.
4+
5+
## Overview
6+
7+
The DAG functionality consists of three main classes:
8+
9+
1. **`model_dag`** - Represents a directed acyclic graph with edges defining the workflow
10+
2. **`model_node`** - Represents a node containing a struct model and a mode function
11+
3. **`data_node`** - Represents a node containing a DatasetExperiment object
12+
13+
## Classes
14+
15+
### model_dag
16+
17+
The `model_dag` class extends `struct_class` and contains an `edges` slot that defines the connections between nodes.
18+
19+
```r
20+
# Create a DAG
21+
dag = model_dag(
22+
name = 'My Workflow',
23+
description = 'A simple workflow',
24+
edges = list(
25+
list(from = 'Data', to = 'Preprocessing'),
26+
list(from = 'Preprocessing', to = 'Analysis')
27+
)
28+
)
29+
```
30+
31+
### model_node
32+
33+
The `model_node` class contains a struct model object and a mode function that operates on the model.
34+
35+
```r
36+
# Create a model node
37+
pca_model = PCA()
38+
node = model_node(
39+
name = 'PCA Analysis',
40+
description = 'Principal Component Analysis',
41+
model = pca_model,
42+
mode = model_apply # or model_train, model_predict, model_reverse
43+
)
44+
```
45+
46+
### data_node
47+
48+
The `data_node` class contains a DatasetExperiment object that serves as input to other nodes.
49+
50+
```r
51+
# Create a data node
52+
D = iris_DatasetExperiment()
53+
data_node = data_node(
54+
name = 'My Data',
55+
description = 'Iris dataset',
56+
data = D
57+
)
58+
59+
# Access the data
60+
data_value(data_node)
61+
```
62+
63+
## Execution
64+
65+
The `dag_execute` function executes a DAG by:
66+
67+
1. Validating the DAG structure
68+
2. Performing topological sorting to determine execution order
69+
3. Executing nodes in the correct order
70+
4. Passing outputs between nodes according to the edges
71+
72+
```r
73+
# Execute a DAG
74+
nodes = list(
75+
'Data' = data_node,
76+
'Preprocessing' = preprocessing_node,
77+
'Analysis' = analysis_node
78+
)
79+
results = dag_execute(dag, nodes, verbose = TRUE)
80+
```
81+
82+
## Example Workflows
83+
84+
### Simple Preprocessing Workflow
85+
86+
```r
87+
# Load data
88+
D = iris_DatasetExperiment()
89+
90+
# Create nodes
91+
data_node = data_node(name = 'Data', data = D)
92+
mean_center_node = model_node(
93+
name = 'Mean Centering',
94+
model = mean_centre(),
95+
mode = model_apply
96+
)
97+
pca_node = model_node(
98+
name = 'PCA',
99+
model = PCA(),
100+
mode = model_apply
101+
)
102+
103+
# Create DAG
104+
dag = model_dag(
105+
name = 'Preprocessing Workflow',
106+
edges = list(
107+
list(from = 'Data', to = 'Mean Centering'),
108+
list(from = 'Mean Centering', to = 'PCA')
109+
)
110+
)
111+
112+
# Execute
113+
nodes = list(
114+
'Data' = data_node,
115+
'Mean Centering' = mean_center_node,
116+
'PCA' = pca_node
117+
)
118+
results = dag_execute(dag, nodes)
119+
```
120+
121+
### Complex Workflow with Parallel Paths
122+
123+
```r
124+
# Create a workflow with parallel PCA and PLS analysis
125+
dag = model_dag(
126+
name = 'Complex Analysis',
127+
edges = list(
128+
list(from = 'Data', to = 'Preprocessing'),
129+
list(from = 'Preprocessing', to = 'PCA Train'),
130+
list(from = 'Preprocessing', to = 'PLS Train'),
131+
list(from = 'PCA Train', to = 'PCA Predict'),
132+
list(from = 'PLS Train', to = 'PLS Predict')
133+
)
134+
)
135+
```
136+
137+
## Available Modes
138+
139+
The following modes can be used with model nodes:
140+
141+
- `model_apply` - Train and apply the model in one step
142+
- `model_train` - Train the model only
143+
- `model_predict` - Apply a trained model
144+
- `model_reverse` - Apply the reverse transformation
145+
146+
## Validation
147+
148+
The DAG execution includes several validation checks:
149+
150+
1. Ensures all nodes referenced in edges exist
151+
2. Validates that all nodes are of the correct type
152+
3. Checks for cycles in the graph
153+
4. Ensures all model nodes have input data
154+
155+
## Error Handling
156+
157+
The DAG execution provides informative error messages for common issues:
158+
159+
- Missing nodes referenced in edges
160+
- Invalid node types
161+
- Cycles in the graph
162+
- Missing input data for model nodes
163+
164+
## Benefits
165+
166+
The DAG functionality provides several benefits:
167+
168+
1. **Modularity** - Each step is encapsulated in its own node
169+
2. **Reusability** - Nodes can be reused in different workflows
170+
3. **Clarity** - The workflow structure is explicitly defined
171+
4. **Validation** - Automatic validation of workflow structure
172+
5. **Flexibility** - Support for complex workflows with parallel paths

DESCRIPTION

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,9 @@ Collate:
3434
'chart_class.R'
3535
'stato_class.R'
3636
'DatasetExperiment_class.R'
37+
'dag_execute.R'
38+
'struct_node_class.R'
39+
'data_node_class.R'
3740
'entity_class.R'
3841
'entity_stato_class.R'
3942
'enum_class.R'
@@ -44,13 +47,16 @@ Collate:
4447
'model_list_class.R'
4548
'metric_class.R'
4649
'iterator_class.R'
50+
'model_dag_class.R'
51+
'model_node_class.R'
4752
'optimiser_class.R'
53+
'prediction_node_class.R'
4854
'preprocess_class.R'
4955
'resampler_class.R'
5056
'struct-package.R'
5157
'struct_templates.R'
5258
'zzz.R'
53-
RoxygenNote: 7.3.1
59+
RoxygenNote: 7.3.2
5460
Depends: R (>= 4.0)
5561
Suggests:
5662
testthat,

NAMESPACE

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,13 +2,21 @@
22

33
S3method(.DollarNames,DatasetExperiment)
44
S3method(.DollarNames,chart)
5+
S3method(.DollarNames,data_node)
56
S3method(.DollarNames,iterator)
67
S3method(.DollarNames,metric)
78
S3method(.DollarNames,model)
9+
S3method(.DollarNames,model_dag)
10+
S3method(.DollarNames,model_node)
811
S3method(.DollarNames,optimiser)
12+
S3method(.DollarNames,prediction_node)
913
S3method(.DollarNames,preprocess)
1014
S3method(.DollarNames,resampler)
1115
S3method(.DollarNames,struct_class)
16+
S3method(.DollarNames,struct_node)
17+
export("data_value<-")
18+
export("edges<-")
19+
export("model<-")
1220
export("output_list<-")
1321
export("output_value<-")
1422
export("param_list<-")
@@ -25,6 +33,10 @@ export(chart)
2533
export(chart_names)
2634
export(chart_plot)
2735
export(citations)
36+
export(dag_execute)
37+
export(data_node)
38+
export(data_value)
39+
export(edges)
2840
export(entity)
2941
export(entity_stato)
3042
export(enum)
@@ -42,6 +54,8 @@ export(max_length)
4254
export(metric)
4355
export(model)
4456
export(model_apply)
57+
export(model_dag)
58+
export(model_node)
4559
export(model_predict)
4660
export(model_reverse)
4761
export(model_seq)
@@ -64,6 +78,7 @@ export(param_obj)
6478
export(param_value)
6579
export(predicted)
6680
export(predicted_name)
81+
export(prediction_node)
6782
export(preprocess)
6883
export(resampler)
6984
export(result)
@@ -79,6 +94,7 @@ export(stato_id)
7994
export(stato_name)
8095
export(stato_summary)
8196
export(struct_class)
97+
export(struct_node)
8298
export(struct_template)
8399
export(test_metric)
84100
export(value)
@@ -89,7 +105,10 @@ exportMethods("*")
89105
exportMethods("+")
90106
exportMethods("[")
91107
exportMethods("[<-")
108+
exportMethods("data_value<-")
109+
exportMethods("edges<-")
92110
exportMethods("max_length<-")
111+
exportMethods("model<-")
93112
exportMethods("models<-")
94113
exportMethods("output_list<-")
95114
exportMethods("output_obj<-")
@@ -109,13 +128,17 @@ exportMethods(calculate)
109128
exportMethods(chart_names)
110129
exportMethods(chart_plot)
111130
exportMethods(citations)
131+
exportMethods(dag_execute)
132+
exportMethods(data_value)
133+
exportMethods(edges)
112134
exportMethods(evaluate)
113135
exportMethods(export_xlsx)
114136
exportMethods(is_output)
115137
exportMethods(is_param)
116138
exportMethods(length)
117139
exportMethods(libraries)
118140
exportMethods(max_length)
141+
exportMethods(model)
119142
exportMethods(model_apply)
120143
exportMethods(model_predict)
121144
exportMethods(model_reverse)

0 commit comments

Comments
 (0)