Skip to content

Commit db87882

Browse files
authored
Merge pull request #1 from cmontemuino/add-analyzer
feat: add analyzer implementation
2 parents 46d98bd + 081d8b3 commit db87882

File tree

48 files changed

+6728
-107
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

48 files changed

+6728
-107
lines changed

.gitignore

Lines changed: 32 additions & 107 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,21 @@
1+
# Analysis output directories (generated content)
2+
analysis/
3+
!analysis/.gitkeep
4+
5+
# Analysis artifacts
6+
*.png
7+
*.pdf
8+
*.html
9+
**/plots/
10+
**/reports/
11+
**/tables/
12+
13+
# Keep assets for documentation
14+
!docs/assets/
15+
!assets/**/*.png
16+
!assets/**/*.jpg
17+
!assets/**/*.svg
18+
119
# Byte-compiled / optimized / DLL files
220
__pycache__/
321
*.py[codz]
@@ -36,7 +54,7 @@ MANIFEST
3654
pip-log.txt
3755
pip-delete-this-directory.txt
3856

39-
# Unit test / coverage reports
57+
# Unit tests / coverage reports
4058
htmlcov/
4159
.tox/
4260
.nox/
@@ -55,84 +73,20 @@ cover/
5573
*.mo
5674
*.pot
5775

58-
# Django stuff:
59-
*.log
60-
local_settings.py
61-
db.sqlite3
62-
db.sqlite3-journal
63-
64-
# Flask stuff:
65-
instance/
66-
.webassets-cache
67-
68-
# Scrapy stuff:
69-
.scrapy
70-
7176
# Sphinx documentation
7277
docs/_build/
7378

7479
# PyBuilder
7580
.pybuilder/
7681
target/
7782

78-
# Jupyter Notebook
79-
.ipynb_checkpoints
80-
81-
# IPython
82-
profile_default/
83-
ipython_config.py
84-
85-
# pyenv
86-
# For a library or package, you might want to ignore these files since the code is
87-
# intended to run in multiple environments; otherwise, check them in:
88-
# .python-version
89-
90-
# pipenv
91-
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
92-
# However, in case of collaboration, if having platform-specific dependencies or dependencies
93-
# having no cross-platform support, pipenv may install dependencies that don't work, or not
94-
# install all needed dependencies.
95-
#Pipfile.lock
96-
97-
# UV
98-
# Similar to Pipfile.lock, it is generally recommended to include uv.lock in version control.
99-
# This is especially recommended for binary packages to ensure reproducibility, and is more
100-
# commonly ignored for libraries.
101-
#uv.lock
102-
103-
# poetry
104-
# Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
105-
# This is especially recommended for binary packages to ensure reproducibility, and is more
106-
# commonly ignored for libraries.
107-
# https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
108-
#poetry.lock
109-
#poetry.toml
110-
111-
# pdm
112-
# Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
113-
# pdm recommends including project-wide configuration in pdm.toml, but excluding .pdm-python.
114-
# https://pdm-project.org/en/latest/usage/project/#working-with-version-control
115-
#pdm.lock
116-
#pdm.toml
117-
.pdm-python
118-
.pdm-build/
119-
120-
# pixi
121-
# Similar to Pipfile.lock, it is generally recommended to include pixi.lock in version control.
122-
#pixi.lock
123-
# Pixi creates a virtual environment in the .pixi directory, just like venv module creates one
124-
# in the .venv directory. It is recommended not to include this directory in version control.
125-
.pixi
83+
84+
85+
12686

12787
# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
12888
__pypackages__/
12989

130-
# Celery stuff
131-
celerybeat-schedule
132-
celerybeat.pid
133-
134-
# SageMath parsed files
135-
*.sage.py
13690

13791
# Environments
13892
.env
@@ -144,12 +98,6 @@ ENV/
14498
env.bak/
14599
venv.bak/
146100

147-
# Spyder project settings
148-
.spyderproject
149-
.spyproject
150-
151-
# Rope project settings
152-
.ropeproject
153101

154102
# mkdocs documentation
155103
/site
@@ -159,49 +107,26 @@ venv.bak/
159107
.dmypy.json
160108
dmypy.json
161109

162-
# Pyre type checker
163-
.pyre/
164-
165-
# pytype static type analyzer
166-
.pytype/
167-
168-
# Cython debug symbols
169-
cython_debug/
170110

171111
# PyCharm
172112
# JetBrains specific template is maintained in a separate JetBrains.gitignore that can
173113
# be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
174114
# and can be added to the global gitignore or merged into this file. For a more nuclear
175115
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
176-
#.idea/
177-
178-
# Abstra
179-
# Abstra is an AI-powered process automation framework.
180-
# Ignore directories containing user credentials, local state, and settings.
181-
# Learn more at https://abstra.io/docs
182-
.abstra/
183-
184-
# Visual Studio Code
185-
# Visual Studio Code specific template is maintained in a separate VisualStudioCode.gitignore
186-
# that can be found at https://github.com/github/gitignore/blob/main/Global/VisualStudioCode.gitignore
187-
# and can be added to the global gitignore or merged into this file. However, if you prefer,
188-
# you could uncomment the following to ignore the entire vscode folder
189-
# .vscode/
116+
.idea/
190117

191118
# Ruff stuff:
192119
.ruff_cache/
193120

194121
# PyPI configuration file
195122
.pypirc
196123

197-
# Cursor
198-
# Cursor is an AI-powered code editor. `.cursorignore` specifies files/directories to
199-
# exclude from AI features like autocomplete and code analysis. Recommended for sensitive data
200-
# refer to https://docs.cursor.com/context/ignore-files
201-
.cursorignore
202-
.cursorindexingignore
203-
204-
# Marimo
205-
marimo/_static/
206-
marimo/_lsp/
207-
__marimo__/
124+
# VIM
125+
*.swp
126+
*.swo
127+
*~
128+
129+
# Project specific
130+
*.log
131+
*.pid
132+
.DS_Store

.python-version

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
3.12

README.md

Lines changed: 132 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,134 @@
11
# amd-mi300x-ml-benchmarks
22
Comprehensive machine learning benchmarking framework for AMD MI300X GPUs on Dell PowerEdge XE9680 hardware. Supports both inference (vLLM) and training workloads with containerized test suites, hardware monitoring, and analysis tools for performance, power efficiency, and scalability research across the complete ML pipeline.
3+
4+
## Quick Start with Sample Data
5+
6+
The repository includes sample benchmark results for immediate testing:
7+
8+
```shell
9+
# Clone and setup
10+
git clone https://github.com/cmontemuino/amd-mi300x-ml-benchmarks.git
11+
cd amd-mi300x-ml-benchmarks
12+
uv sync --extra analysis
13+
14+
# Run analysis on sample dataset
15+
uv run analyze-results --input-dir datasets/sample-results --output-dir analysis/sample-output
16+
17+
# Or use the Python API
18+
uv run python -c "
19+
from amd_bench.schemas.examples import sample_dataset_example
20+
analyzer = sample_dataset_example()
21+
summary = analyzer.get_results_summary()
22+
print(f'Processed {summary["total_results"]} results')
23+
print(f'Models: {summary["models"]}')
24+
"
25+
```
26+
27+
## Analysis Commands
28+
29+
### Basic Analysis
30+
31+
```shell
32+
# Analyze results with command line parameters
33+
34+
uv run analyze-results --input-dir datasets/sample-results --output-dir analysis/sample-output
35+
36+
# Using YAML configuration
37+
uv run analyze-results run --config-file config/analysis-config.yaml
38+
```
39+
40+
### Python API Usage
41+
42+
```python
43+
from pathlib import Path
44+
from amd_bench.core.analysis import BenchmarkAnalyzer
45+
from amd_bench.schemas.benchmark import AnalysisConfig
46+
47+
# Basic configuration
48+
49+
config = AnalysisConfig(
50+
input_dir=Path("datasets/sample-results"),
51+
output_dir=Path("analysis/sample-output")
52+
)
53+
54+
analyzer = BenchmarkAnalyzer(config)
55+
analyzer.process_results()
56+
57+
# Get summary of results
58+
summary = analyzer.get_results_summary()
59+
print(f"Analyzed {summary['total_results']} benchmark results")
60+
```
61+
62+
### Generated Output Structure
63+
64+
After running analysis, you'll find:
65+
66+
```text
67+
analysis/sample-output/
68+
├── plots/
69+
│ ├── batch_size_scaling.png
70+
│ ├── batch_size_scaling_by_memory.png
71+
│ ├── latency_analysis.png
72+
│ ├── memory_efficiency.png
73+
│ └── throughput_comparison.png
74+
├── reports/
75+
│ ├── analysis_summary.json
76+
│ └── benchmark_analysis_report.md
77+
└── tables/
78+
├── batch_size_analysis.csv
79+
├── memory_utilization_analysis.csv
80+
└── model_performance_summary.csv
81+
```
82+
83+
### Creating Custom Configurations
84+
85+
Create a YAML configuration file for custom analysis:
86+
87+
```shell
88+
# config/custom-analysis.yaml
89+
90+
input_dir: "datasets/sample-results"
91+
output_dir: "analysis/custom-output"
92+
results_pattern: "*.json"
93+
include_hardware_metrics: true
94+
generate_plots: true
95+
96+
filename_formats:
97+
pattern: "([^]+)([^_]+)_bs(\d+)in(\d+)out(\d+)([^]+)mem([\d,\.]+)(.+)"
98+
groups:
99+
model: 1
100+
benchmark_type: 2
101+
batch_size: 3
102+
input_length: 4
103+
output_length: 5
104+
dtype: 6
105+
memory_util: 7
106+
timestamp: 8
107+
description: "Standard vLLM format"
108+
priority: 1
109+
110+
default_parameters:
111+
benchmark_type: "latency"
112+
memory_util: "0.8"
113+
```
114+
115+
## 📊 Understanding Results
116+
117+
After running analysis, learn how to interpret your results:
118+
- **[Complete Analysis Guide](docs/user-guide/analysis-guide.md)** - Comprehensive guide to understanding benchmark results
119+
120+
## 🔧 Development
121+
122+
**For Contributors:** See [CONTRIBUTING.md](docs/CONTRIBUTING.md) for development setup and guidelines.
123+
124+
**For Users:** The quick start above is sufficient for running analysis.
125+
126+
## Research Reproducibility
127+
128+
This project follows research software engineering best practices:
129+
130+
- **Reproducible environments**: Locked dependencies with `uv.lock`
131+
- **Data validation**: Pydantic schemas for all data structures
132+
- **Comprehensive logging**: Structured logs for all operations
133+
- **Statistical rigor**: Proper statistical analysis methods
134+
- **Configuration management**: YAML-based configuration system

analysis/.gitkeep

Whitespace-only changes.

0 commit comments

Comments
 (0)