Skip to content

Commit 0e5721b

Browse files
committed
AE: update FP and Performance Overhead Instructions
1 parent 25af3a0 commit 0e5721b

File tree

2 files changed

+76
-71
lines changed

2 files changed

+76
-71
lines changed

docs/ae.md

Lines changed: 73 additions & 64 deletions
Original file line numberDiff line numberDiff line change
@@ -14,14 +14,14 @@ Welcome to the artifact evaluation guide for **TrainCheck** (OSDI'25). This docu
1414
- [ ] Ran **[Performance Overhead](#eval-performance-overhead)** measurement
1515
- [ ] Verified outputs match expected results (tolerances noted per experiment)
1616

17-
## 📎 Additional Resources
17+
## 📎 Resources You Need
1818

1919
In addition to this guide, you will need the following resources throughout the evaluation process:
2020

2121
1. [**5-Minute Tutorial**](./5-min-tutorial.md) — A quick walkthrough that introduces TrainCheck’s workflow using a real-world bug.
2222
2. [**TrainCheck Installation Guide**](./installation-guide.md) — Step-by-step instructions for setting up TrainCheck.
2323
3. [**Technical Usage Guide**](./technical-doc.md) — Detailed documentation on how to use TrainCheck, configure instrumentation, and interpret outputs.
24-
4. [**Evaluation Workloads Repository**](https://github.com/OrderLab/TrainCheck-Evaluation-Workloads) — Contains all evaluation workloads used in the experiments.
24+
4. [**Evaluation Workloads Repository**](https://github.com/OrderLab/TrainCheck-Evaluation-Workloads) — Contains all evaluation workloads and automation scripts used in the experiments.
2525

2626
## 1. Overview
2727

@@ -48,6 +48,14 @@ We suggest running the evaluations in the following order, based on automation l
4848

4949
## 2. Environment Requirements
5050

51+
Many of our experiment scripts are written in xonsh, a shell that combines Python and Bash.
52+
Please install it with:
53+
54+
```bash
55+
conda activate traincheck
56+
pip3 install 'xonsh[full]'
57+
```
58+
5159
For a full and efficient AE experience, we recommend the following setup:
5260
- 🖥 1 machine with 2× CUDA-enabled GPUs
5361
- Each GPU should have at least 12 GiB memory.
@@ -101,44 +109,51 @@ The target results are discussed in the main text of **Section 5.4** of the pape
101109

102110
### 📂 Resources & Scripts
103111

104-
- **Automation Script**:
105-
- `traincheck-ae-resources/fp_rate/ae_fp.py`
112+
- **Automation Scripts**:
113+
- [`TrainCheck-Evaluation-Workloads/fp_rate/ae_fp.py`](https://github.com/OrderLab/TrainCheck-Evaluation-Workloads/blob/main/fp_rate/ae_fp.py): The script to collect traces, perform invariant inference, and check invariants on supposedly-correct programs to see if there are any false alarms.
114+
- [`TrainCheck-Evaluation-Workloads/fp_rate/compute_fp_rate.py`](https://github.com/OrderLab/TrainCheck-Evaluation-Workloads/blob/main/fp_rate/compute_fp_rate.py): The script to compute false positive rates from the invariant checking results.
106115

107116
- **Workloads**:
108-
- The evaluation uses official PyTorch training pipelines located at `traincheck-ae-resources/fp_rate/workloads`.
109-
We have shortened the training runs for faster execution.
117+
- The evaluation uses official PyTorch training pipelines located at [`TrainCheck-Evaluation-Workloads/fp_rate/workloads`](https://github.com/OrderLab/TrainCheck-Evaluation-Workloads/tree/main/fp_rate/workloads).
118+
We have shortened the training runs for faster execution.
110119
For AE purposes, you do not need to modify or understand the workload code—`ae_fp.py` will automatically handle the entire process.
111120

112121
### 🛠 How to Run
113122

114-
0. Make sure you have a working TrainCheck installation by following [TrainCheck Installation Guide](./installation-guide.md).
123+
1. Make sure you have a working TrainCheck installation by following [TrainCheck Installation Guide](./installation-guide.md).
115124

116-
1. Install necessary dependencies for the false positive evaluation workloads.
117-
```bash
118-
conda activate traincheck # change this if you installed TrainCheck in a different environment.
119-
cd fp_rate
120-
pip3 install -r requirements.txt
121-
```
125+
> All steps described below assumes you are already in the `TrainCheck-Evaluation-Workloads` repo. If not, clone the repository and go to it.
126+
> ```bash
127+
> git clone https://github.com/OrderLab/TrainCheck-Evaluation-Workloads.git
128+
> cd TrainCheck-Evaluation-Workloads
129+
> ```
122130
123-
2. Execute `ae_fp.py` to collect traces, perform invariant inference, and check the invariants on validation programs.
131+
2. Install necessary dependencies for the false positive evaluation workloads.
132+
```bash
133+
conda activate traincheck # change this if you installed TrainCheck in a different environment.
134+
cd fp_rate
135+
pip3 install -r requirements.txt
136+
```
124137
125-
The workload `ddp-multigpu` will need 2 GPUs. We have provided the trace for `ddp-multigpu` in case you do not have two GPUs.
138+
3. Execute `ae_fp.py` to collect traces, perform invariant inference, and check the invariants on validation programs.
126139
127-
If you need to use our pre-computed trace for `ddp-multigpu`, remove the `--overwrite-existing-results` argument.
128-
```bash
129-
python3 ae_fp.py --bench workloads
130-
```
140+
The workload `ddp-multigpu` will need 2 GPUs. We have provided the trace for `ddp-multigpu` in case you do not have two GPUs.
131141
132-
Or, if you have a machine with 2 GPUs, execute the below command, such that the original results will be re-computed.
133-
```bash
134-
python3 ae_fp.py --bench workloads --overwrite-existing-results
135-
```
142+
If you need to use our pre-computed trace for `ddp-multigpu`, remove the `--overwrite-existing-results` argument.
143+
```bash
144+
python3 ae_fp.py --bench workloads
145+
```
136146
137-
3. Execute `compute_fp_rates.py` to compute the false positive rates.
147+
Or, if you have a machine with 2 GPUs, execute the below command, such that the original results will be re-computed.
148+
```bash
149+
python3 ae_fp.py --bench workloads --overwrite-existing-results
150+
```
138151
139-
```bash
140-
python3 compute_fp_rates.py
141-
```
152+
4. Execute `compute_fp_rates.py` to compute the false positive rates.
153+
154+
```bash
155+
python3 compute_fp_rates.py
156+
```
142157
143158
### What to Expect During Execution
144159
@@ -249,65 +264,59 @@ If the issue persists, please contact us for assistance。
249264
250265
## Eval: Performance Overhead
251266
252-
⏳ Estimated Completion Time: 30 minutes.
267+
⏳ Estimated Completion Time: 10 minutes.
253268
254269
### 🎯 Goal
255270
256271
This evaluation measures the runtime overhead introduced by TrainCheck’s instrumentation compared to un-instrumented runs across a set of representative ML workloads, during the invariant checking stage. The results correspond to Section 5.5 of the paper.
257272
258-
259273
### 📂 Resources & Scripts
260274
261-
- Automation Scripts:
262-
- `eval_scripts/perf_benchmark/run_all.xsh`: run the experiments and collect data.
263-
- `eval_scripts/perf_benchmark/analysis.xsh`: analyze raw data and produce input for the plot script.
264-
- `eval_scripts/perf_benchmark/plot_e2e.py` and `eval_scripts/perf_benchmark/plot_micro.py`: plot the figures in Section 5.5.
275+
> Files described below are all in the [TrainCheck-Evaluation-Workloads](https://github.com/OrderLab/TrainCheck-Evaluation-Workloads/) repo.
276+
277+
- Automation Scripts:
278+
- [`performance_overhead/ae_perf.sh`](https://github.com/OrderLab/TrainCheck-Evaluation-Workloads/blob/main/performance_overhead/ae_perf.sh): End-to-end script for running the performance overhead benchmarks (Section 5.5) and generating Figure 7. It internally calls:
279+
- `run_all.xsh`: Runs the experiments and collects raw data (per-iteration duration).
280+
- `analysis.xsh`: Analyzes the raw data and prepares input for plotting.
281+
- `plot_e2e.py`: Plots the final results.
265282
266-
- Workloads (You probably won't need to touch this):
267-
- Located in [overhead-e2e](../eval_scripts/perf_benchmark/overhead-e2e) and [overhead-micro](../eval_scripts/perf_benchmark/overhead-micro)
268-
- No pre-collected data is required—this evaluation runs end-to-end automatically and is pretty light weight
283+
- Workloads (You won't need to touch this):
284+
- Located in [overhead-e2e](../eval_scripts/perf_benchmark/overhead-e2e)
269285
270-
- Deployed 100 invariants:
286+
- The deployed 100 invariants:
271287
[eval_scripts/perf_benchmark/overhead-e2e/sampled_100_invariants.json](../eval_scripts/perf_benchmark/overhead-e2e/sampled_100_invariants.json)
272288
273289
274290
### 🛠 How to Run
275291
276-
1. Navigate to the performance benchmark directory:
277-
```bash
278-
cd eval_scripts/perf_benchmark/
279-
```
292+
1. Make sure you have a working TrainCheck installation by following [TrainCheck Installation Guide](./installation-guide.md).
280293
281-
2. Run the full benchmark suite using:
282-
```bash
283-
xonsh eval_scripts/perf_benchmark/run_all.xsh
284-
```
285-
This script will:
286-
- Execute each workload in three modes:
287-
- No instrumentation
288-
- TrainCheck selective instrumentation with 100 invariants deployed
289-
- Python settrace baseline (a lightweight instrumentation baseline)
290-
- Measure per-iteration training time.
291-
- Save raw results in a folder named: `perf_eval_res_<commit_hash>`
292-
293-
You should then execute the below commands that analyze the data and produce plots.
294-
```bash
295-
xonsh analysis.xsh --res_folder perf_eval_res_<commit_hash>
294+
> All steps described below assumes you are already in the `TrainCheck-Evaluation-Workloads` repo. If not, clone the repository and go to it.
295+
> ```bash
296+
> git clone https://github.com/OrderLab/TrainCheck-Evaluation-Workloads.git
297+
> cd TrainCheck-Evaluation-Workloads
298+
> ```
296299
297-
python3 plot_e2e.py -o perf_eval_res_<commit_hash>/macro.pdf -i perf_eval_res_<commit_hash>/overhead_e2e.csv -t <commit_hash>
300+
2. Execute `ae_perf.sh`.
298301
299-
python3 plot_micro.py -o perf_eval_res_<commit_hash>/micro.pdf -i perf_eval_res_<commit_hash>/wrapper_overhead_micro.csv -t <commit_hash>
300-
```
302+
```bash
303+
conda activate traincheck
304+
cd performance_overhead
305+
306+
bash ae_perf.sh
307+
```
301308
302309
### Expected Output
303-
Key files in `perf_eval_res_<commit_hash>`:
304-
- `overhead_e2e.csv` and `marco.pdf` data and plot for benchmarks presented in Section 5.5.
305-
- `wrapper_overhead_micro.csv` and `micro.pdf`: data and plot for the pure wrapper overhead on individual APIs.
310+
311+
After execution completes, a plot will be generated at `performance_ae.pdf`. All the raw data are stored at a folder named `perf_res_ae`.
306312
307313
### ✅ How to Verify
308-
• Check that the overhead percentages in overhead_results.csv are consistent with those reported in Section 5.5.
309-
• Variations (within ±15% TODO confirm) are expected due to runtime and hardware differences.
310314
315+
- Open the generated file performance_ae.pdf and compare it against Figure 7 in the paper.
316+
- Small differences in the overhead numbers (within ±20%) are expected.
317+
TrainCheck’s overhead is sensitive to CPU performance, since trace serialization is blocking and CPU-bound.
318+
- Despite minor variations, the key takeaway should remain clear:
319+
TrainCheck’s selective instrumentation incurs significantly lower overhead compared to other methods.
311320
312321
### ⚠️ Notes & Troubleshooting
313322
1. **Do Not Run Other GPU Tasks in Parallel**

eval_scripts/perf_benchmark/run_all.xsh

Lines changed: 3 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -75,7 +75,7 @@ def run_exp(kill_sec: int = 100, workload: str = "mnist", use_proxy: bool = Fals
7575
SETTRACE_PY = "main_settrace.py"
7676
RUN_SH = "run.sh"
7777
MD_CONFIG_YML = "md-config.yml" if not use_proxy else "md-config-var.yml"
78-
CMD_TRAINCHECK = f"python -m traincheck.collect_trace --use-config --config {MD_CONFIG_YML} --output-dir traincheck"
78+
CMD_TRAINCHECK = f"python -m traincheck.collect_trace --use-config --config {MD_CONFIG_YML} --output-dir traincheck-all"
7979
CMD_TRAINCHECK_SELECTIVE = f"python -m traincheck.collect_trace --use-config --config {MD_CONFIG_YML} --output-dir traincheck-selective -i ../{SELC_INV_FILE}"
8080

8181
if not os.path.exists(f"{E2E_FOLDER}/{workload}/{RUN_SH}"):
@@ -111,12 +111,8 @@ def run_exp(kill_sec: int = 100, workload: str = "mnist", use_proxy: bool = Fals
111111
# 3. traincheck proxy instrumentation
112112
print("Running traincheck instrumentation")
113113
run_cmd(CMD_TRAINCHECK, kill_sec)
114-
print("Trying to copy")
115-
print(os.listdir("traincheck"))
116-
# shutil.copy("traincheck/iteration_times.txt", f"../../{RES_FOLDER}/e2e_{workload}_monkey-patch.txt")
117-
cp traincheck/iteration_times.txt @(f"../../{RES_FOLDER}/e2e_{workload}_monkey-patch.txt")
118-
print("Copied")
119-
rm -rf traincheck
114+
cp traincheck-all/iteration_times.txt @(f"../../{RES_FOLDER}/e2e_{workload}_monkey-patch.txt")
115+
rm -rf traincheck-all
120116

121117
# 4. traincheck selective instrumentation
122118
print("Running traincheck selective instrumentation")

0 commit comments

Comments
 (0)