You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/ae.md
+73-64Lines changed: 73 additions & 64 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -14,14 +14,14 @@ Welcome to the artifact evaluation guide for **TrainCheck** (OSDI'25). This docu
14
14
-[ ] Ran **[Performance Overhead](#eval-performance-overhead)** measurement
15
15
-[ ] Verified outputs match expected results (tolerances noted per experiment)
16
16
17
-
## 📎 Additional Resources
17
+
## 📎 Resources You Need
18
18
19
19
In addition to this guide, you will need the following resources throughout the evaluation process:
20
20
21
21
1.[**5-Minute Tutorial**](./5-min-tutorial.md) — A quick walkthrough that introduces TrainCheck’s workflow using a real-world bug.
22
22
2.[**TrainCheck Installation Guide**](./installation-guide.md) — Step-by-step instructions for setting up TrainCheck.
23
23
3.[**Technical Usage Guide**](./technical-doc.md) — Detailed documentation on how to use TrainCheck, configure instrumentation, and interpret outputs.
24
-
4.[**Evaluation Workloads Repository**](https://github.com/OrderLab/TrainCheck-Evaluation-Workloads) — Contains all evaluation workloads used in the experiments.
24
+
4.[**Evaluation Workloads Repository**](https://github.com/OrderLab/TrainCheck-Evaluation-Workloads) — Contains all evaluation workloads and automation scripts used in the experiments.
25
25
26
26
## 1. Overview
27
27
@@ -48,6 +48,14 @@ We suggest running the evaluations in the following order, based on automation l
48
48
49
49
## 2. Environment Requirements
50
50
51
+
Many of our experiment scripts are written in xonsh, a shell that combines Python and Bash.
52
+
Please install it with:
53
+
54
+
```bash
55
+
conda activate traincheck
56
+
pip3 install 'xonsh[full]'
57
+
```
58
+
51
59
For a full and efficient AE experience, we recommend the following setup:
52
60
- 🖥 1 machine with 2× CUDA-enabled GPUs
53
61
- Each GPU should have at least 12 GiB memory.
@@ -101,44 +109,51 @@ The target results are discussed in the main text of **Section 5.4** of the pape
101
109
102
110
### 📂 Resources & Scripts
103
111
104
-
-**Automation Script**:
105
-
-`traincheck-ae-resources/fp_rate/ae_fp.py`
112
+
-**Automation Scripts**:
113
+
-[`TrainCheck-Evaluation-Workloads/fp_rate/ae_fp.py`](https://github.com/OrderLab/TrainCheck-Evaluation-Workloads/blob/main/fp_rate/ae_fp.py): The script to collect traces, perform invariant inference, and check invariants on supposedly-correct programs to see if there are any false alarms.
114
+
-[`TrainCheck-Evaluation-Workloads/fp_rate/compute_fp_rate.py`](https://github.com/OrderLab/TrainCheck-Evaluation-Workloads/blob/main/fp_rate/compute_fp_rate.py): The script to compute false positive rates from the invariant checking results.
106
115
107
116
-**Workloads**:
108
-
- The evaluation uses official PyTorch training pipelines located at `traincheck-ae-resources/fp_rate/workloads`.
109
-
We have shortened the training runs for faster execution.
117
+
- The evaluation uses official PyTorch training pipelines located at [`TrainCheck-Evaluation-Workloads/fp_rate/workloads`](https://github.com/OrderLab/TrainCheck-Evaluation-Workloads/tree/main/fp_rate/workloads).
118
+
We have shortened the training runs for faster execution.
110
119
For AE purposes, you do not need to modify or understand the workload code—`ae_fp.py` will automatically handle the entire process.
111
120
112
121
### 🛠 How to Run
113
122
114
-
0. Make sure you have a working TrainCheck installation by following [TrainCheck Installation Guide](./installation-guide.md).
123
+
1. Make sure you have a working TrainCheck installation by following [TrainCheck Installation Guide](./installation-guide.md).
115
124
116
-
1. Install necessary dependencies for the false positive evaluation workloads.
117
-
```bash
118
-
conda activate traincheck # change this if you installed TrainCheck in a different environment.
119
-
cd fp_rate
120
-
pip3 install -r requirements.txt
121
-
```
125
+
> All steps described below assumes you are already in the `TrainCheck-Evaluation-Workloads` repo. If not, clone the repository and go to it.
4. Execute `compute_fp_rates.py` to compute the false positive rates.
153
+
154
+
```bash
155
+
python3 compute_fp_rates.py
156
+
```
142
157
143
158
### What to Expect During Execution
144
159
@@ -249,65 +264,59 @@ If the issue persists, please contact us for assistance。
249
264
250
265
## Eval: Performance Overhead
251
266
252
-
⏳ Estimated Completion Time: 30 minutes.
267
+
⏳ Estimated Completion Time: 10 minutes.
253
268
254
269
### 🎯 Goal
255
270
256
271
This evaluation measures the runtime overhead introduced by TrainCheck’s instrumentation compared to un-instrumented runs across a set of representative ML workloads, during the invariant checking stage. The results correspond to Section 5.5 of the paper.
257
272
258
-
259
273
### 📂 Resources & Scripts
260
274
261
-
- Automation Scripts:
262
-
-`eval_scripts/perf_benchmark/run_all.xsh`: run the experiments and collect data.
263
-
-`eval_scripts/perf_benchmark/analysis.xsh`: analyze raw data and produce input for the plot script.
264
-
-`eval_scripts/perf_benchmark/plot_e2e.py` and `eval_scripts/perf_benchmark/plot_micro.py`: plot the figures in Section 5.5.
275
+
> Files described below are all in the [TrainCheck-Evaluation-Workloads](https://github.com/OrderLab/TrainCheck-Evaluation-Workloads/) repo.
276
+
277
+
- Automation Scripts:
278
+
- [`performance_overhead/ae_perf.sh`](https://github.com/OrderLab/TrainCheck-Evaluation-Workloads/blob/main/performance_overhead/ae_perf.sh): End-to-end script for running the performance overhead benchmarks (Section 5.5) and generating Figure 7. It internally calls:
279
+
- `run_all.xsh`: Runs the experiments and collects raw data (per-iteration duration).
280
+
- `analysis.xsh`: Analyzes the raw data and prepares input for plotting.
281
+
- `plot_e2e.py`: Plots the final results.
265
282
266
-
- Workloads (You probably won't need to touch this):
267
-
- Located in [overhead-e2e](../eval_scripts/perf_benchmark/overhead-e2e) and [overhead-micro](../eval_scripts/perf_benchmark/overhead-micro)
268
-
- No pre-collected data is required—this evaluation runs end-to-end automatically and is pretty light weight
283
+
- Workloads (You won't need to touch this):
284
+
- Located in [overhead-e2e](../eval_scripts/perf_benchmark/overhead-e2e)
0 commit comments