Skip to content

Commit c2bf669

Browse files
noemotiovonTcc0403
andcommitted
[Benchmark]: Add --gpu-filter to visualizer and simplify D2 guidelines
benchmarks_visualizer.py: - Add `--gpu-filter` CLI flag to select a specific GPU when benchmark data contains results from multiple devices; falls back to the most recent device with a warning when omitted or unmatched. - Extract `gpu_name_filter()` and `extra_config_filter()` as standalone helpers; `load_data()` now applies filters in explicit order: kernel/metric/mode → sweep-mode → GPU → extra config. BENCHMARK_GUIDELINES.md: - Add guideline: import baseline kernels from the test suite instead of duplicating reference implementations in benchmark scripts. - Remove the continuous hidden-size sweep variant (D2.1) and `compute_hidden_size_sweep_config()` reference; D2 now covers only the discrete model-config sweep. Co-authored-by: Tcc0403 <76503978+Tcc0403@users.noreply.github.com>
1 parent fc01015 commit c2bf669

File tree

2 files changed

+120
-54
lines changed

2 files changed

+120
-54
lines changed

benchmark/BENCHMARK_GUIDELINES.md

Lines changed: 9 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,11 @@
55
- **Location**: `benchmark/scripts/`
66
- **Naming**: `benchmark_<kernel_name>.py` (e.g. `benchmark_geglu.py`, `benchmark_dyt.py`)
77

8+
> **Baseline implementations**: Import reference (non-Liger) kernels from the
9+
> test suite (e.g. `test/transformers/test_<kernel>.py`) to use as baselines.
10+
> This keeps benchmark and test implementations in sync and avoids duplicating
11+
> reference code in benchmark scripts.
12+
813
## 2. Shared infrastructure
914

1015
Do **not** hardcode batch size, sequence length, or model dimensions. All benchmark scripts share the following:
@@ -13,7 +18,7 @@ Do **not** hardcode batch size, sequence length, or model dimensions. All benchm
1318
|------|-----|
1419
| Model dimensions (hidden_size, vocab_size, etc.) | `benchmark_model_configs.py`: `ModelConfig`, `MODEL_REGISTRY`, `get_benchmark_model_config()` |
1520
| Memory probing | `benchmark_model_configs.py`: `estimate_kernel_peak_memory()` |
16-
| Safe sweep configs | `compute_seq_len_sweep_config()`, `compute_hidden_size_sweep_config()`, `compute_model_config_sweep_config()` |
21+
| Safe sweep configs | `compute_seq_len_sweep_config()`, `compute_model_config_sweep_config()` |
1722
| Speed / memory measurement | `utils.py`: `run_speed_benchmark()`, `run_memory_benchmark()` |
1823
| Running the grid and writing CSV | `utils.py`: `run_benchmarks()` |
1924
| CLI arguments | `utils.py`: `parse_benchmark_script_args()` — provides `--model`, `--overwrite`, `--sweep-mode`, `--bt` |
@@ -94,25 +99,9 @@ python benchmark_geglu.py --model llama_3_8b --overwrite
9499

95100
## 4. D2 — Model dimension sweep
96101

97-
Sweep model-related dimensions (e.g. hidden_size, or discrete model configs from `MODEL_REGISTRY`) with a **fixed token count**. Use `--bt` to set the token count.
98-
99-
D2 has two variants:
100-
101-
### 4.1 Continuous sweep (e.g. hidden_size)
102-
103-
Sweep a single model parameter (like hidden_size) in a continuous range with fixed BT.
104-
105-
**How to implement:**
106-
107-
1. Probe: measure peak memory at `(BT, model.hidden_size)`.
108-
2. `config = compute_hidden_size_sweep_config(model, kernel_peak_bytes=peak_bytes, bt=BT)`. Returns `HiddenSizeSweepConfig` with `bt` and `max_hidden_size`.
109-
3. Build `x_values` from `config.max_hidden_size` (e.g. `[1024 * i for i in range(1, 17) if 1024 * i <= config.max_hidden_size]`).
110-
4. Build `extra_benchmark_configs` with `BT=config.bt`, `dtype=model.dtype`, etc.
111-
5. Call `run_benchmarks(...)`.
112-
113-
**Reference**: `benchmark_dyt.py` — hidden_size sweep with `compute_hidden_size_sweep_config()`.
102+
Sweep across discrete model configs from `MODEL_REGISTRY` with a **fixed token count**. Use `--bt` to set the token count.
114103

115-
### 4.2 Discrete model-config sweep
104+
### 4.1 Discrete model-config sweep
116105

117106
Sweep across all `MODEL_REGISTRY` entries as discrete data points. Activated by `--sweep-mode model_config`.
118107

@@ -155,7 +144,7 @@ def bench_speed_geglu_model_config(input):
155144

156145
**Reference**: `benchmark_geglu.py`, `benchmark_swiglu.py`, `benchmark_dyt.py` — all support `--sweep-mode model_config`.
157146

158-
### 4.3 How to run
147+
### 4.2 How to run
159148

160149
```bash
161150
# Discrete model-config sweep with default bt=2048

benchmark/benchmarks_visualizer.py

Lines changed: 111 additions & 34 deletions
Original file line numberDiff line numberDiff line change
@@ -44,6 +44,7 @@ class VisualizationsConfig:
4444
kernel_operation_mode: str = "full"
4545
sweep_mode: str = "token_length"
4646
extra_config_filter: str | None = None
47+
gpu_filter: str | None = None
4748
display: bool = False
4849
overwrite: bool = False
4950

@@ -86,6 +87,14 @@ def parse_args() -> VisualizationsConfig:
8687
"Can be a substring to match or a JSON-like 'key=value' pair (e.g., \"'H': 4096\" or \"H=4096\" for simple cases). "
8788
"Defaults to None (first available config if multiple exist).",
8889
)
90+
parser.add_argument(
91+
"--gpu-filter",
92+
type=str,
93+
default=None,
94+
help="Filter by GPU name. When multiple devices are present, selects "
95+
"the matching GPU (uses most recent match if multiple found). "
96+
"If omitted, the most recent device is used automatically.",
97+
)
8998
parser.add_argument("--display", action="store_true", help="Display the visualization")
9099
parser.add_argument(
91100
"--overwrite",
@@ -97,51 +106,77 @@ def parse_args() -> VisualizationsConfig:
97106
return args
98107

99108

100-
def load_data(config: VisualizationsConfig) -> pd.DataFrame:
101-
"""Loads the benchmark data from the CSV file and filters it based on the configuration.
109+
def gpu_name_filter(df: pd.DataFrame, gpu_filter: str | None = None) -> pd.DataFrame:
110+
"""Filter benchmark data by GPU name when multiple devices are present.
102111
103112
Args:
104-
config (VisualizationsConfig): Configuration object for the visualizations script.
105-
106-
Raises:
107-
ValueError: If no data is found for the given filters.
113+
df: Pre-filtered benchmark dataframe.
114+
gpu_filter: Optional GPU name substring to match. If provided, selects
115+
the matching GPU (uses most recent if multiple match). If None,
116+
automatically picks the most recent device.
108117
109118
Returns:
110-
pd.DataFrame: Filtered benchmark dataframe.
119+
pd.DataFrame: Dataframe filtered to a single GPU.
111120
"""
112-
df = pd.read_csv(DATA_PATH)
113-
df["extra_benchmark_config"] = df["extra_benchmark_config_str"].apply(json.loads)
121+
if "gpu_name" not in df.columns or df.empty:
122+
return df
123+
124+
unique_gpus = df["gpu_name"].unique()
125+
if len(unique_gpus) <= 1:
126+
return df
127+
128+
if gpu_filter:
129+
matched = [g for g in unique_gpus if gpu_filter in g]
130+
if matched:
131+
if len(matched) > 1:
132+
# Multiple matches — pick the most recent
133+
matched_df = df[df["gpu_name"].isin(matched)]
134+
selected = matched_df.sort_values("timestamp", ascending=False)["gpu_name"].iloc[0]
135+
print(
136+
f"Warning: Multiple GPUs match filter '{gpu_filter}': {matched}. "
137+
f"Using the most recent: '{selected}'."
138+
)
139+
else:
140+
selected = matched[0]
141+
else:
142+
# No match — fall back to most recent GPU
143+
selected = df.sort_values("timestamp", ascending=False)["gpu_name"].iloc[0]
144+
print(
145+
f"Warning: No GPU matches filter '{gpu_filter}'. "
146+
f"Available GPUs: {list(unique_gpus)}. "
147+
f"Falling back to most recent device: '{selected}'."
148+
)
149+
else:
150+
# No filter provided — pick the most recent device
151+
selected = df.sort_values("timestamp", ascending=False)["gpu_name"].iloc[0]
152+
print(
153+
f"Warning: Data contains entries from multiple devices: {list(unique_gpus)}. "
154+
f"Using data from the most recent device: '{selected}'. "
155+
f"Use --gpu-filter to select a specific device."
156+
)
114157

115-
mask = (
116-
(df["kernel_name"] == config.kernel_name)
117-
& (df["metric_name"] == config.metric_name)
118-
& (df["kernel_operation_mode"] == config.kernel_operation_mode)
119-
)
158+
return df[df["gpu_name"] == selected]
120159

121-
# Filter by sweep mode early, before extra_benchmark_config resolution.
122-
if config.sweep_mode == "model_config":
123-
mask = mask & (df["x_name"] == SWEEP_MODE_X_NAME)
124-
elif config.sweep_mode == "token_length":
125-
mask = mask & (df["x_name"] != SWEEP_MODE_X_NAME)
126160

127-
base_filtered_df = df[mask]
161+
def extra_config_filter(df: pd.DataFrame, config: VisualizationsConfig) -> pd.DataFrame:
162+
"""Filter benchmark data by extra_benchmark_config.
128163
129-
if base_filtered_df.empty:
130-
raise ValueError(
131-
f"No data found for kernel_name='{config.kernel_name}', "
132-
f"metric_name='{config.metric_name}', "
133-
f"kernel_operation_mode='{config.kernel_operation_mode}'."
134-
)
164+
Args:
165+
df: Pre-filtered benchmark dataframe (already filtered by kernel, metric, etc.).
166+
config: Visualization configuration with optional extra_config_filter.
135167
136-
unique_extra_configs_str = base_filtered_df["extra_benchmark_config_str"].unique()
168+
Returns:
169+
pd.DataFrame: Dataframe filtered to a single extra_benchmark_config.
170+
"""
171+
unique_extra_configs_str = df["extra_benchmark_config_str"].unique()
137172
selected_extra_config_str = None
138173

139174
if len(unique_extra_configs_str) == 0:
140175
print(
141176
"Warning: No extra_benchmark_config found for the initial filters. "
142177
"Proceeding with all data from initial filter."
143178
)
144-
return base_filtered_df
179+
return df
145180

146181
if config.extra_config_filter:
147182
matched_configs = []
@@ -196,14 +231,12 @@ def load_data(config: VisualizationsConfig) -> pd.DataFrame:
196231
print(f"Using unique extra_benchmark_config: {selected_extra_config_str}")
197232

198233
if selected_extra_config_str:
199-
final_filtered_df = base_filtered_df[
200-
base_filtered_df["extra_benchmark_config_str"] == selected_extra_config_str
201-
]
234+
result_df = df[df["extra_benchmark_config_str"] == selected_extra_config_str]
202235
else:
203236
print("Warning: Could not select an extra_benchmark_config. Using data from initial filter if any.")
204-
final_filtered_df = base_filtered_df
237+
result_df = df
205238

206-
if final_filtered_df.empty:
239+
if result_df.empty:
207240
raise ValueError(
208241
f"No data found after attempting to filter by extra_benchmark_config. "
209242
f"Selected/Defaulted extra_config_str: {selected_extra_config_str}"
@@ -214,7 +247,50 @@ def load_data(config: VisualizationsConfig) -> pd.DataFrame:
214247
print(
215248
f"Plotting data for extra_benchmark_config: {json.loads(selected_extra_config_str if selected_extra_config_str else '{}')}"
216249
)
217-
return final_filtered_df
250+
return result_df
251+
252+
253+
def load_data(config: VisualizationsConfig) -> pd.DataFrame:
254+
"""Loads the benchmark data from the CSV file and filters it based on the configuration.
255+
256+
Applies filters in order: kernel/metric/mode → sweep mode → GPU → extra config.
257+
258+
Args:
259+
config (VisualizationsConfig): Configuration object for the visualizations script.
260+
261+
Raises:
262+
ValueError: If no data is found for the given filters.
263+
264+
Returns:
265+
pd.DataFrame: Filtered benchmark dataframe.
266+
"""
267+
df = pd.read_csv(DATA_PATH)
268+
df["extra_benchmark_config"] = df["extra_benchmark_config_str"].apply(json.loads)
269+
270+
mask = (
271+
(df["kernel_name"] == config.kernel_name)
272+
& (df["metric_name"] == config.metric_name)
273+
& (df["kernel_operation_mode"] == config.kernel_operation_mode)
274+
)
275+
276+
# Filter by sweep mode early, before extra_benchmark_config resolution.
277+
if config.sweep_mode == "model_config":
278+
mask = mask & (df["x_name"] == SWEEP_MODE_X_NAME)
279+
elif config.sweep_mode == "token_length":
280+
mask = mask & (df["x_name"] != SWEEP_MODE_X_NAME)
281+
282+
base_filtered_df = df[mask]
283+
284+
if base_filtered_df.empty:
285+
raise ValueError(
286+
f"No data found for kernel_name='{config.kernel_name}', "
287+
f"metric_name='{config.metric_name}', "
288+
f"kernel_operation_mode='{config.kernel_operation_mode}'."
289+
)
290+
291+
# Apply GPU filter, then extra config filter
292+
base_filtered_df = gpu_name_filter(base_filtered_df, config.gpu_filter)
293+
return extra_config_filter(base_filtered_df, config)
218294

219295

220296
def plot_data(df: pd.DataFrame, config: VisualizationsConfig):
@@ -331,6 +407,7 @@ def main():
331407
kernel_operation_mode=mode,
332408
sweep_mode=args.sweep_mode,
333409
extra_config_filter=args.extra_config_filter,
410+
gpu_filter=args.gpu_filter,
334411
display=args.display,
335412
overwrite=args.overwrite,
336413
)

0 commit comments

Comments
 (0)