Skip to content

Commit 2b72777

Browse files
committed
doc: traincheck-collect (basic usage instructions)
1 parent 6330572 commit 2b72777

File tree

10 files changed

+1279
-19
lines changed

10 files changed

+1279
-19
lines changed

docs/ae.md

Lines changed: 34 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -9,8 +9,8 @@ We provide pre-collected traces and pre-inferred invariants to simplify and spee
99
- [ ] Environment set up (Python, dependencies, 2 CUDA GPUs with ≥ 12GiB memory each)
1010
- [ ] (*Optional*) Downloaded pre-collected / pre-computed data
1111
- [ ] Ran **[Silent Issue Detection](#eval-silent-issue-detection)** experiment
12+
- [ ] Ran **[Invariant Transferability](#eval-transferability)** evaluation
1213
- [ ] Ran **[False Positive Rate](#false-positive-rate)** evaluation
13-
- [ ] Ran **[Transferability](#eval-transferability)** evaluation
1414
- [ ] Ran **[Performance Overhead](#eval-performance-overhead)** measurement
1515
- [ ] Verified outputs match expected results (tolerances noted per experiment)
1616

@@ -21,8 +21,8 @@ We provide pre-collected traces and pre-inferred invariants to simplify and spee
2121
This artifact allows you to reproduce the major 4 evaluation results presented in the paper.
2222

2323
- [ ] Ran **[Silent Issue Detection (Section 5.1 and 5.2)](#eval-silent-issue-detection)** experiment
24-
- [ ] Ran **[False Positive Rate (Section 5.3)](#false-positive-rate)** evaluation
25-
- [ ] Ran **[Transferability (Section 5.4)](#eval-transferability)** evaluation
24+
- [ ] Ran **[Invariant Transferability (Section 5.3)](#eval-transferability)** evaluation
25+
- [ ] Ran **[False Positive Rate (Section 5.4)](#false-positive-rate)** evaluation
2626
- [ ] Ran **[Performance Overhead (Section 5.5)](#eval-performance-overhead)** measurement
2727

2828
### ⏱️ Recommended Evaluation Order
@@ -96,20 +96,49 @@ It will help you get familiar with the workflow and also verify that your instal
9696

9797
## Eval: False Positive Rate
9898

99+
⏳ Estimated Completion Time: TBD hour.
100+
- Trace Collection: x hours
101+
- Invariant Inference: x hours
102+
- Invariant Checking: x hours
103+
104+
### 🎯 Goal
105+
106+
This evaluation measures the false positive rate of alarms from TrainCheck's invariants.
107+
108+
### 📂 Resources & Scripts
109+
110+
- Automation Scripts:
111+
1. TBD
112+
2. TBD
113+
3. TBD
114+
- Workloads: PyTorch official pipelines, accessible at TBD FP WORKLOAD
115+
116+
### 🛠 How to Run
117+
xxx
118+
99119
## Eval: Transferability
100120

121+
⏳ Estimated Completion Time: TBD hour.
122+
- Trace Collection: x hours
123+
- Invariant Inference: x hours
124+
- Invariant Checking: x hours
125+
126+
### 🎯 Goal
127+
128+
This evaluation measures the transferability of invariants inferred by TrainCheck.
129+
101130
## Eval: Performance Overhead
102131

103132
⏳ Estimated Completion Time: 1.5 hour.
104133

105134
### 🎯 Goal
106135

107-
This evaluation measures the runtime overhead introduced by TrainCheck’s instrumentation compared to uninstrumented runs across a set of representative ML workloads, during the invariant checking stage. The results correspond to Section 5.5 of the paper.
136+
This evaluation measures the runtime overhead introduced by TrainCheck’s instrumentation compared to un-instrumented runs across a set of representative ML workloads, during the invariant checking stage. The results correspond to Section 5.5 of the paper.
108137

109138

110139
### 📂 Resources & Scripts
111140

112-
- Automation Script:
141+
- Automation Scripts:
113142
- `eval_scripts/perf_benchmark/run_all.xsh`: run the experiments and collect data.
114143
- `eval_scripts/perf_benchmark/analysis.xsh`: analyze raw data and produce input for the plot script.
115144
- `eval_scripts/perf_benchmark/plot_e2e.py` and `eval_scripts/perf_benchmark/plot_micro.py`: plot the figures in Section 5.5.
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
Language model pretraining script from the official examples of the transformers library.
2+
Trains GPT-2 on
3+
4+
Modifications:
5+
1. 10 steps per training/testing epoch.
6+
2. stage annotations
7+
3. skip instrumentation for the tokenization step
Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
modules_to_instr:
2+
- torch
3+
- transformers
4+
- accelerate
5+
pyscript: run_clm_no_trainer.py
6+
shscript: run.sh
7+
copy_all_files: true
8+
models_to_track:
9+
- model
10+
model_tracker_style: proxy
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
python run_clm_no_trainer.py \
2+
--dataset_name wikitext \
3+
--dataset_config_name wikitext-2-raw-v1 \
4+
--model_name_or_path distilbert/distilgpt2 \
5+
--output_dir /tmp/test-clm \
6+
--per_device_train_batch_size 2 \
7+
--per_device_eval_batch_size 2 \
8+
--num_train_epochs 1 \

0 commit comments

Comments
 (0)