You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/ae.md
+34-5Lines changed: 34 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -9,8 +9,8 @@ We provide pre-collected traces and pre-inferred invariants to simplify and spee
9
9
-[ ] Environment set up (Python, dependencies, 2 CUDA GPUs with ≥ 12GiB memory each)
10
10
-[ ] (*Optional*) Downloaded pre-collected / pre-computed data
11
11
-[ ] Ran **[Silent Issue Detection](#eval-silent-issue-detection)** experiment
12
+
-[ ] Ran **[Invariant Transferability](#eval-transferability)** evaluation
12
13
-[ ] Ran **[False Positive Rate](#false-positive-rate)** evaluation
13
-
-[ ] Ran **[Transferability](#eval-transferability)** evaluation
14
14
-[ ] Ran **[Performance Overhead](#eval-performance-overhead)** measurement
15
15
-[ ] Verified outputs match expected results (tolerances noted per experiment)
16
16
@@ -21,8 +21,8 @@ We provide pre-collected traces and pre-inferred invariants to simplify and spee
21
21
This artifact allows you to reproduce the major 4 evaluation results presented in the paper.
22
22
23
23
-[ ] Ran **[Silent Issue Detection (Section 5.1 and 5.2)](#eval-silent-issue-detection)** experiment
24
-
-[ ] Ran **[False Positive Rate (Section 5.3)](#false-positive-rate)** evaluation
25
-
-[ ] Ran **[Transferability (Section 5.4)](#eval-transferability)** evaluation
24
+
-[ ] Ran **[Invariant Transferability (Section 5.3)](#eval-transferability)** evaluation
25
+
-[ ] Ran **[False Positive Rate (Section 5.4)](#false-positive-rate)** evaluation
26
26
-[ ] Ran **[Performance Overhead (Section 5.5)](#eval-performance-overhead)** measurement
27
27
28
28
### ⏱️ Recommended Evaluation Order
@@ -96,20 +96,49 @@ It will help you get familiar with the workflow and also verify that your instal
96
96
97
97
## Eval: False Positive Rate
98
98
99
+
⏳ Estimated Completion Time: TBD hour.
100
+
- Trace Collection: x hours
101
+
- Invariant Inference: x hours
102
+
- Invariant Checking: x hours
103
+
104
+
### 🎯 Goal
105
+
106
+
This evaluation measures the false positive rate of alarms from TrainCheck's invariants.
107
+
108
+
### 📂 Resources & Scripts
109
+
110
+
- Automation Scripts:
111
+
1. TBD
112
+
2. TBD
113
+
3. TBD
114
+
- Workloads: PyTorch official pipelines, accessible at TBD FP WORKLOAD
115
+
116
+
### 🛠 How to Run
117
+
xxx
118
+
99
119
## Eval: Transferability
100
120
121
+
⏳ Estimated Completion Time: TBD hour.
122
+
- Trace Collection: x hours
123
+
- Invariant Inference: x hours
124
+
- Invariant Checking: x hours
125
+
126
+
### 🎯 Goal
127
+
128
+
This evaluation measures the transferability of invariants inferred by TrainCheck.
129
+
101
130
## Eval: Performance Overhead
102
131
103
132
⏳ Estimated Completion Time: 1.5 hour.
104
133
105
134
### 🎯 Goal
106
135
107
-
This evaluation measures the runtime overhead introduced by TrainCheck’s instrumentation compared to uninstrumented runs across a set of representative ML workloads, during the invariant checking stage. The results correspond to Section 5.5 of the paper.
136
+
This evaluation measures the runtime overhead introduced by TrainCheck’s instrumentation compared to un-instrumented runs across a set of representative ML workloads, during the invariant checking stage. The results correspond to Section 5.5 of the paper.
108
137
109
138
110
139
### 📂 Resources & Scripts
111
140
112
-
- Automation Script:
141
+
- Automation Scripts:
113
142
-`eval_scripts/perf_benchmark/run_all.xsh`: run the experiments and collect data.
114
143
-`eval_scripts/perf_benchmark/analysis.xsh`: analyze raw data and produce input for the plot script.
115
144
-`eval_scripts/perf_benchmark/plot_e2e.py` and `eval_scripts/perf_benchmark/plot_micro.py`: plot the figures in Section 5.5.
0 commit comments