cmontemuino
diff --git a/‎.gitignore‎
Lines changed: 32 additions & 107 deletions b/‎.gitignore‎
Lines changed: 32 additions & 107 deletions
diff --git a/‎.python-version‎
Lines changed: 1 addition & 0 deletions b/‎.python-version‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎README.md‎
Lines changed: 132 additions & 0 deletions b/‎README.md‎
Lines changed: 132 additions & 0 deletions
diff --git a/‎analysis/.gitkeep‎ b/‎analysis/.gitkeep‎
@@ -1,3 +1,21 @@
+# Analysis output directories (generated content)
+analysis/
+!analysis/.gitkeep
+
+# Analysis artifacts
+*.png
+*.pdf
+*.html
+**/plots/
+**/reports/
+**/tables/
+
+# Keep assets for documentation
+!docs/assets/
+!assets/**/*.png
+!assets/**/*.jpg
+!assets/**/*.svg
+
 # Byte-compiled / optimized / DLL files
 __pycache__/
 *.py[codz]
@@ -36,7 +54,7 @@ MANIFEST
 pip-log.txt
 pip-delete-this-directory.txt
 
-# Unit test / coverage reports
+# Unit tests / coverage reports
 htmlcov/
 .tox/
 .nox/
@@ -55,84 +73,20 @@ cover/
 *.mo
 *.pot
 
-# Django stuff:
-*.log
-local_settings.py
-db.sqlite3
-db.sqlite3-journal
-
-# Flask stuff:
-instance/
-.webassets-cache
-
-# Scrapy stuff:
-.scrapy
-
 # Sphinx documentation
 docs/_build/
 
 # PyBuilder
 .pybuilder/
 target/
 
-# Jupyter Notebook
-.ipynb_checkpoints
-
-# IPython
-profile_default/
-ipython_config.py
-
-# pyenv
-#   For a library or package, you might want to ignore these files since the code is
-#   intended to run in multiple environments; otherwise, check them in:
-# .python-version
-
-# pipenv
-#   According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
-#   However, in case of collaboration, if having platform-specific dependencies or dependencies
-#   having no cross-platform support, pipenv may install dependencies that don't work, or not
-#   install all needed dependencies.
-#Pipfile.lock
-
-# UV
-#   Similar to Pipfile.lock, it is generally recommended to include uv.lock in version control.
-#   This is especially recommended for binary packages to ensure reproducibility, and is more
-#   commonly ignored for libraries.
-#uv.lock
-
-# poetry
-#   Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
-#   This is especially recommended for binary packages to ensure reproducibility, and is more
-#   commonly ignored for libraries.
-#   https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
-#poetry.lock
-#poetry.toml
-
-# pdm
-#   Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
-#   pdm recommends including project-wide configuration in pdm.toml, but excluding .pdm-python.
-#   https://pdm-project.org/en/latest/usage/project/#working-with-version-control
-#pdm.lock
-#pdm.toml
-.pdm-python
-.pdm-build/
-
-# pixi
-#   Similar to Pipfile.lock, it is generally recommended to include pixi.lock in version control.
-#pixi.lock
-#   Pixi creates a virtual environment in the .pixi directory, just like venv module creates one
-#   in the .venv directory. It is recommended not to include this directory in version control.
-.pixi
+
+
+
 
 # PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
 __pypackages__/
 
-# Celery stuff
-celerybeat-schedule
-celerybeat.pid
-
-# SageMath parsed files
-*.sage.py
 
 # Environments
 .env
@@ -144,12 +98,6 @@ ENV/
 env.bak/
 venv.bak/
 
-# Spyder project settings
-.spyderproject
-.spyproject
-
-# Rope project settings
-.ropeproject
 
 # mkdocs documentation
 /site
@@ -159,49 +107,26 @@ venv.bak/
 .dmypy.json
 dmypy.json
 
-# Pyre type checker
-.pyre/
-
-# pytype static type analyzer
-.pytype/
-
-# Cython debug symbols
-cython_debug/
 
 # PyCharm
 #  JetBrains specific template is maintained in a separate JetBrains.gitignore that can
 #  be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
 #  and can be added to the global gitignore or merged into this file.  For a more nuclear
 #  option (not recommended) you can uncomment the following to ignore the entire idea folder.
-#.idea/
-
-# Abstra
-# Abstra is an AI-powered process automation framework.
-# Ignore directories containing user credentials, local state, and settings.
-# Learn more at https://abstra.io/docs
-.abstra/
-
-# Visual Studio Code
-#  Visual Studio Code specific template is maintained in a separate VisualStudioCode.gitignore 
-#  that can be found at https://github.com/github/gitignore/blob/main/Global/VisualStudioCode.gitignore
-#  and can be added to the global gitignore or merged into this file. However, if you prefer, 
-#  you could uncomment the following to ignore the entire vscode folder
-# .vscode/
+.idea/
 
 # Ruff stuff:
 .ruff_cache/
 
 # PyPI configuration file
 .pypirc
 
-# Cursor
-#  Cursor is an AI-powered code editor. `.cursorignore` specifies files/directories to
-#  exclude from AI features like autocomplete and code analysis. Recommended for sensitive data
-#  refer to https://docs.cursor.com/context/ignore-files
-.cursorignore
-.cursorindexingignore
-
-# Marimo
-marimo/_static/
-marimo/_lsp/
-__marimo__/
+# VIM
+*.swp
+*.swo
+*~
+
+# Project specific
+*.log
+*.pid
+.DS_Store
@@ -0,0 +1 @@
+3.12
@@ -1,2 +1,134 @@
 # amd-mi300x-ml-benchmarks
 Comprehensive machine learning benchmarking framework for AMD MI300X GPUs on Dell PowerEdge XE9680 hardware. Supports both inference (vLLM) and training workloads with containerized test suites, hardware monitoring, and analysis tools for performance, power efficiency, and scalability research across the complete ML pipeline.
+
+## Quick Start with Sample Data
+
+The repository includes sample benchmark results for immediate testing:
+
+```shell
+# Clone and setup
+git clone https://github.com/cmontemuino/amd-mi300x-ml-benchmarks.git
+cd amd-mi300x-ml-benchmarks
+uv sync --extra analysis
+
+# Run analysis on sample dataset
+uv run analyze-results --input-dir datasets/sample-results --output-dir analysis/sample-output
+
+# Or use the Python API
+uv run python -c "
+from amd_bench.schemas.examples import sample_dataset_example
+analyzer = sample_dataset_example()
+summary = analyzer.get_results_summary()
+print(f'Processed {summary["total_results"]} results')
+print(f'Models: {summary["models"]}')
+"
+```
+
+## Analysis Commands
+
+### Basic Analysis
+
+```shell
+# Analyze results with command line parameters
+
+uv run analyze-results --input-dir datasets/sample-results --output-dir analysis/sample-output
+
+# Using YAML configuration
+uv run analyze-results run --config-file config/analysis-config.yaml
+```
+
+### Python API Usage
+
+```python
+from pathlib import Path
+from amd_bench.core.analysis import BenchmarkAnalyzer
+from amd_bench.schemas.benchmark import AnalysisConfig
+
+# Basic configuration
+
+config = AnalysisConfig(
+    input_dir=Path("datasets/sample-results"),
+    output_dir=Path("analysis/sample-output")
+)
+
+analyzer = BenchmarkAnalyzer(config)
+analyzer.process_results()
+
+# Get summary of results
+summary = analyzer.get_results_summary()
+print(f"Analyzed {summary['total_results']} benchmark results")
+```
+
+### Generated Output Structure
+
+After running analysis, you'll find:
+
+```text
+analysis/sample-output/
+├── plots/
+│ ├── batch_size_scaling.png
+│ ├── batch_size_scaling_by_memory.png
+│ ├── latency_analysis.png
+│ ├── memory_efficiency.png
+│ └── throughput_comparison.png
+├── reports/
+│ ├── analysis_summary.json
+│ └── benchmark_analysis_report.md
+└── tables/
+  ├── batch_size_analysis.csv
+  ├── memory_utilization_analysis.csv
+  └── model_performance_summary.csv
+```
+
+### Creating Custom Configurations
+
+Create a YAML configuration file for custom analysis:
+
+```shell
+# config/custom-analysis.yaml
+
+input_dir: "datasets/sample-results"
+output_dir: "analysis/custom-output"
+results_pattern: "*.json"
+include_hardware_metrics: true
+generate_plots: true
+
+filename_formats:
+    pattern: "([^]+)([^_]+)_bs(\d+)in(\d+)out(\d+)([^]+)mem([\d,\.]+)(.+)"
+    groups:
+    model: 1
+    benchmark_type: 2
+    batch_size: 3
+    input_length: 4
+    output_length: 5
+    dtype: 6
+    memory_util: 7
+    timestamp: 8
+    description: "Standard vLLM format"
+    priority: 1
+
+default_parameters:
+benchmark_type: "latency"
+memory_util: "0.8"
+```
+
+## 📊 Understanding Results
+
+After running analysis, learn how to interpret your results:
+- **[Complete Analysis Guide](docs/user-guide/analysis-guide.md)** - Comprehensive guide to understanding benchmark results
+
+## 🔧 Development
+
+**For Contributors:** See [CONTRIBUTING.md](docs/CONTRIBUTING.md) for development setup and guidelines.
+
+**For Users:** The quick start above is sufficient for running analysis.
+
+## Research Reproducibility
+
+This project follows research software engineering best practices:
+
+- **Reproducible environments**: Locked dependencies with `uv.lock`
+- **Data validation**: Pydantic schemas for all data structures
+- **Comprehensive logging**: Structured logs for all operations
+- **Statistical rigor**: Proper statistical analysis methods
+- **Configuration management**: YAML-based configuration system