This directory contains demonstrations that showcase the capabilities of the graph characterization and partitioning system.
Examples answer: "How do I use this system?"
These demos show:
- Basic usage patterns
- End-to-end workflows
- Visualization techniques
- Model comparison methods
| Aspect | Examples (./examples/) |
Tests (./tests/) |
Validation (./validation/) |
|---|---|---|---|
| Purpose | Show how to use | Verify correctness | Check accuracy |
| Question | "How do I...?" | "Does it work?" | "Is it accurate?" |
| User | End users | Developers | Researchers |
| Speed | Interactive | Fast (<1s) | Slow (seconds) |
30-second introduction to graph partitioning and analysis.
python examples/quick_start_partitioner.pyWhat it shows:
- Load a model (ResNet-18)
- Trace with PyTorch FX
- Partition into subgraphs
- Analyze concurrency
- View top compute operations
Best for: First-time users
Compare unfused vs fused partitioning to see memory reduction benefits.
python examples/demo_fusion_comparison.py
# Test on different models
python examples/demo_fusion_comparison.py --model mobilenet_v2What it shows:
- Fusion pattern detection (Conv+BN+ReLU)
- Memory traffic reduction (20-42%)
- Kernel launch reduction (1.9-2.1×)
- Per-subgraph fusion benefits
Best for: Understanding fusion impact
Demonstrate the Phase 2 hardware mapping system.
python examples/demo_new_performance_model.pyWhat it shows:
- Realistic hardware utilization modeling
- GPU SM allocation
- Memory bandwidth constraints
- Latency estimation across precisions
Best for: Hardware-aware optimization
Compare multiple models side-by-side.
python examples/compare_models.py --models resnet18 mobilenet_v2 efficientnet_b0What it shows:
- FLOPs comparison
- Memory footprint
- Arithmetic intensity
- Parallelism characteristics
- Concurrency metrics
Best for: Architecture selection
Generate visual diagrams of partition structure.
python examples/visualize_partitioning.py --model resnet18 --output graph.pdfWhat it shows:
- Subgraph dependency graph
- Critical path highlighting
- Bottleneck visualization
- Parallelism opportunities
Best for: Understanding model structure
Demonstrate the physics-based energy model and multi-fabric architecture across different hardware types.
python examples/demo_jetson_fabric.pyWhat it shows:
- GPU multi-fabric (CUDA cores + Tensor Cores)
- 15% Tensor Core efficiency gain at 8nm
- Physics-based energy: 1.90 pJ (CUDA), 1.62 pJ (Tensor)
- Process node scaling comparison with H100
python examples/demo_kpu_fabric.pyWhat it shows:
- Heterogeneous tile architecture (70% INT8, 20% BF16, 10% Matrix)
- Matrix tile efficiency gain (15% better than standard)
- Physics-based energy at 16nm: 2.70 pJ (standard), 2.30 pJ (matrix)
- Tile specialization for different workloads
python examples/demo_tpu_fabric.pyWhat it shows:
- Single systolic array fabric (128×128 PEs)
- Weight-stationary dataflow architecture
- Physics-based energy at 7nm: 1.80 pJ
- Process node comparison across GPU/KPU/TPU architectures
Best for: Understanding physics-based energy modeling and multi-fabric architectures
# 1. Quick intro (30 seconds)
python examples/quick_start_partitioner.py
# 2. See fusion benefits
python examples/demo_fusion_comparison.py
# 3. Compare models
python examples/compare_models.py --models resnet18 mobilenet_v2# 1. Partition and analyze
python examples/quick_start_partitioner.py
# 2. Visualize structure
python examples/visualize_partitioning.py --model <your_model>
# 3. Compare to baselines
python examples/compare_models.py --models <your_model> resnet18# 1. See hardware mapping
python examples/demo_new_performance_model.py
# 2. Compare across hardware
cd ../validation/hardware
python test_all_hardware.pyimport torch
from torch.fx import symbolic_trace
from torch.fx.passes.shape_prop import ShapeProp
import sys
import os
sys.path.insert(0, os.path.dirname(os.path.dirname(__file__)))
from src.graphs.characterize.fusion_partitioner import FusionBasedPartitioner
# Load model
model = ... # Your PyTorch model
model.eval()
# Trace with FX
traced = symbolic_trace(model)
ShapeProp(traced).propagate(torch.randn(1, 3, 224, 224))
# Partition with fusion
partitioner = FusionBasedPartitioner()
report = partitioner.partition_graph(traced)
# Access results
print(f"Total FLOPs: {report.total_flops / 1e9:.2f} G")
print(f"Fused subgraphs: {len(report.fused_subgraphs)}")
print(f"Memory reduction: {report.memory_reduction_percent:.1f}%")def analyze_model(model, name):
traced = symbolic_trace(model)
ShapeProp(traced).propagate(torch.randn(1, 3, 224, 224))
partitioner = FusionBasedPartitioner()
report = partitioner.partition_graph(traced)
return {
'name': name,
'flops_g': report.total_flops / 1e9,
'subgraphs': len(report.fused_subgraphs),
'memory_mb': report.total_memory_traffic / 1e6,
}
# Compare
resnet = analyze_model(models.resnet18(), "ResNet-18")
mobilenet = analyze_model(models.mobilenet_v2(), "MobileNet-V2")
print(f"{resnet['name']}: {resnet['flops_g']:.2f} GFLOPs")
print(f"{mobilenet['name']}: {mobilenet['flops_g']:.2f} GFLOPs")# After partitioning...
for subgraph in report.fused_subgraphs:
ai = subgraph.total_flops / subgraph.total_memory_traffic
bottleneck = "compute" if ai > 10 else "memory"
print(f"{subgraph.subgraph_id}: {bottleneck}-bound (AI={ai:.1f})")All examples require:
- Python 3.8+
- PyTorch
- torchvision
Install:
pip install torch torchvisionOptional (for visualization):
pip install matplotlib networkx graphvizImport errors:
- Run from repo root:
python examples/quick_start_partitioner.py - Or set PYTHONPATH:
export PYTHONPATH=/path/to/repo
Model not found:
- Use torchvision models:
models.resnet18() - For custom models, provide model object (not name string)
FX tracing fails:
- Some models have dynamic control flow
- Try:
symbolic_trace(model, concrete_args={...}) - See PyTorch FX docs for workarounds
Slow execution:
- Examples should run in seconds
- If slower, check model size (VIT, large ResNets take longer)
- For batch processing, see
validation/tests
After exploring examples:
-
Run validation tests to see accuracy results:
python validation/hardware/test_all_hardware.py python validation/estimators/test_resnet_family.py
-
Read documentation for deeper understanding:
../docs/GETTING_STARTED.md- Getting started guide../docs/graph_partitioner_tutorial.md- Detailed tutorials../docs/realistic_performance_modeling_plan.md- Architecture
-
Use CLI tools for production workflows:
./cli/partitioner.py --model resnet18 --output results.json ./cli/profile_graph.py --model mobilenet_v2
-
Write your own examples - use these as templates!
To add a new example:
- Create
demo_<feature>.pyor<task>_example.py - Include docstring explaining what it demonstrates
- Add argparse for customization options
- Keep runtime under 30 seconds
- Add to this README with description
Example template:
#!/usr/bin/env python
"""
Demonstration of <feature>
Shows how to <accomplish task> using <components>.
"""
import argparse
import torch
from torch.fx import symbolic_trace
import sys
import os
sys.path.insert(0, os.path.dirname(os.path.dirname(__file__)))
from src.graphs.characterize.<module> import <Component>
def main():
parser = argparse.ArgumentParser(description="Demo of <feature>")
parser.add_argument('--model', default='resnet18', help="Model to use")
args = parser.parse_args()
# Demo code here
print("Demonstrating <feature>...")
if __name__ == '__main__':
main()See also:
../tests/README.md- Unit tests../validation/README.md- Accuracy validation../cli/README.md- Command-line tools../docs/- Full documentation