Skip to content

Commit 1a555ee

Browse files
committed
Create comprehensive GitHub Copilot instructions
1 parent 31ece11 commit 1a555ee

File tree

1 file changed

+255
-0
lines changed

1 file changed

+255
-0
lines changed

.github/copilot-instructions.md

Lines changed: 255 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,255 @@
1+
# IntelliPerf: AI-Powered GPU Performance Engineering Framework
2+
3+
IntelliPerf is a Python-based framework that uses Large Language Models (LLMs) to automatically analyze and optimize GPU kernel performance. It supports HIP/ROCm, Triton, and PyTorch applications, targeting bottlenecks like bank conflicts, memory access patterns, and atomic contention.
4+
5+
Always reference these instructions first and fallback to search or bash commands only when you encounter unexpected information that does not match the info here.
6+
7+
## Working Effectively
8+
9+
### Quick Start (Container Recommended)
10+
Use containers for full functionality including GPU-dependent features:
11+
```bash
12+
# Using Docker (recommended)
13+
./docker/build.sh
14+
./docker/run.sh
15+
16+
# Using Apptainer
17+
./apptainer/build.sh
18+
./apptainer/run.sh
19+
```
20+
21+
### Development Installation (Basic Python Functionality)
22+
For Python-only development without GPU dependencies:
23+
```bash
24+
# Install the main package (takes ~90 seconds)
25+
pip install -e .
26+
27+
# Verify installation
28+
intelliperf --help
29+
```
30+
31+
### Full Dependencies Installation (Network-Intensive)
32+
**WARNING**: This step frequently fails due to network timeouts. NEVER CANCEL builds - they may take 45+ minutes.
33+
```bash
34+
# Install external tools - NEVER CANCEL: Can take 45+ minutes. Set timeout to 60+ minutes.
35+
python3 scripts/install_tool.py --all
36+
37+
# If network timeouts occur, this is expected - document as "may fail due to network limitations"
38+
```
39+
40+
### Examples Build (Requires ROCm/HIP)
41+
```bash
42+
# Build examples - requires ROCm/HIP environment
43+
cd examples
44+
./scripts/build_examples.sh -c
45+
46+
# Clean build if needed
47+
./scripts/build_examples.sh -c --clean
48+
49+
# Verbose build for debugging
50+
./scripts/build_examples.sh -c --verbose
51+
```
52+
53+
## Core Development Commands
54+
55+
### Code Quality (Always Run Before Committing)
56+
```bash
57+
# Install linting tools
58+
pip install ruff==0.3.0
59+
60+
# Check code style (fast, <1 second)
61+
ruff check .
62+
63+
# Fix auto-fixable issues
64+
ruff check . --fix
65+
66+
# Format code
67+
ruff format .
68+
```
69+
70+
### Pre-commit Hooks (May Fail Due to Network Issues)
71+
```bash
72+
pip install pre-commit==3.6.0
73+
pre-commit install
74+
75+
# Run all hooks - NEVER CANCEL: Takes 2-5 minutes. Set timeout to 10+ minutes.
76+
# NOTE: May fail due to network timeouts - this is expected in some environments
77+
pre-commit run --all-files
78+
```
79+
80+
### Testing
81+
```bash
82+
# Note: Most tests require GPU hardware and ROCm environment
83+
# Basic test check (will fail without GPU libraries but shows test structure)
84+
python -m pytest tests/ -v
85+
86+
# Shell-based integration tests (require built examples)
87+
./tests/test_matrix_transpose.sh
88+
```
89+
90+
## IntelliPerf Usage Patterns
91+
92+
### Diagnose Only (Works Without GPU Optimization)
93+
```bash
94+
# Diagnose HIP application
95+
intelliperf --formula=diagnoseOnly -- ./examples/build/access_pattern/uncoalesced
96+
97+
# Diagnose PyTorch application
98+
intelliperf --formula=diagnoseOnly -- python ./examples/torch/add.py
99+
100+
# Diagnose Triton application
101+
TRITON_DISABLE_LINE_INFO=0 intelliperf --formula=diagnoseOnly -- python ./examples/triton/reduce.py
102+
```
103+
104+
### Full Optimization (Requires LLM API Key and GPU)
105+
```bash
106+
# Set required environment variable
107+
export LLM_GATEWAY_KEY="your_api_key_here"
108+
109+
# Memory access optimization
110+
intelliperf --project_directory=./examples \
111+
--build_command="./scripts/build_examples.sh -c" \
112+
--formula=memoryAccess -- ./build/access_pattern/uncoalesced
113+
114+
# Bank conflict optimization
115+
intelliperf --project_directory=./examples \
116+
--build_command="./scripts/build_examples.sh -c" \
117+
--formula=bankConflict -- ./build/bank_conflict/matrix_transpose 1024 1024
118+
119+
# Atomic contention optimization
120+
intelliperf --project_directory=./examples \
121+
--build_command="./scripts/build_examples.sh -c" \
122+
--instrument_command="./scripts/build_examples.sh -i -c" \
123+
--formula=atomicContention -- ./build/contention/reduction
124+
```
125+
126+
## Manual Validation Requirements
127+
128+
**CRITICAL**: After making any changes to IntelliPerf, ALWAYS run through these complete validation scenarios:
129+
130+
### 1. Memory Access Pattern Validation
131+
```bash
132+
# Test uncoalesced memory access detection and optimization
133+
intelliperf --formula=memoryAccess --project_directory=./examples \
134+
--build_command="./scripts/build_examples.sh -c" \
135+
-- ./build/access_pattern/uncoalesced
136+
137+
# Verify: Should show memory coalescing improvements and performance gains
138+
```
139+
140+
### 2. Bank Conflict Validation
141+
```bash
142+
# Test shared memory bank conflict detection and optimization
143+
intelliperf --formula=bankConflict --project_directory=./examples \
144+
--build_command="./scripts/build_examples.sh -c" \
145+
-- ./build/bank_conflict/matrix_transpose 1024 1024
146+
147+
# Verify: Should show bank conflict reduction and speedup
148+
```
149+
150+
### 3. Atomic Contention Validation
151+
```bash
152+
# Test atomic operation contention detection and optimization
153+
intelliperf --formula=atomicContention --project_directory=./examples \
154+
--build_command="./scripts/build_examples.sh -c" \
155+
--instrument_command="./scripts/build_examples.sh -i -c" \
156+
-- ./build/contention/reduction
157+
158+
# Verify: Should show atomic contention reduction and performance improvement
159+
```
160+
161+
### 4. Multi-Backend Diagnose Validation
162+
```bash
163+
# Test HIP application analysis
164+
intelliperf --formula=diagnoseOnly -- ./examples/build/access_pattern/uncoalesced
165+
166+
# Test PyTorch application analysis
167+
intelliperf --formula=diagnoseOnly -- python ./examples/torch/add.py
168+
169+
# Test Triton application analysis
170+
TRITON_DISABLE_LINE_INFO=0 intelliperf --formula=diagnoseOnly -- python ./examples/triton/reduce.py
171+
172+
# Verify: All should generate valid performance analysis JSON output
173+
```
174+
175+
## Critical Timing and Timeout Information
176+
177+
### Build Commands - NEVER CANCEL
178+
- **Python package install**: 90 seconds normal, set timeout to 3+ minutes
179+
- **External tools install**: 45+ minutes normal, set timeout to 60+ minutes
180+
- **Examples build**: 5-10 minutes normal, set timeout to 15+ minutes
181+
- **Pre-commit setup**: 2-5 minutes normal, set timeout to 10+ minutes
182+
- **IntelliPerf optimization runs**: 10-30 minutes normal, set timeout to 45+ minutes
183+
184+
### Network Issues (Expected)
185+
- External dependency installation frequently fails due to network timeouts
186+
- Pre-commit hooks may fail to install due to PyPI timeouts
187+
- Document these as "may fail due to network limitations" rather than fixing
188+
- Use containers for reliable development environment
189+
190+
## Repository Structure
191+
192+
### Key Directories
193+
```
194+
src/intelliperf/ # Main Python package
195+
src/accordo/ # Validation and correctness checking
196+
examples/ # Test applications in HIP, Triton, PyTorch
197+
scripts/build_examples.sh # Example build system
198+
external/ # External dependencies (rocprofiler-compute, omniprobe, nexus)
199+
tests/ # Integration tests (require GPU hardware)
200+
.github/workflows/ # CI that runs on AMD GPU droplets
201+
```
202+
203+
### Configuration Files
204+
- `pyproject.toml` - Python dependencies and tool configuration
205+
- `.pre-commit-config.yaml` - Code quality hooks
206+
- `.github/workflows/ci.yml` - Full GPU-based testing pipeline
207+
- `docker/` and `apptainer/` - Container definitions
208+
209+
## Environment Requirements
210+
211+
### Minimal (Python Development)
212+
- Python 3.8+
213+
- pip
214+
215+
### Full Functionality
216+
- ROCm/HIP environment
217+
- AMD GPU hardware (tested on MI300X)
218+
- Network access for dependency installation
219+
- LLM API key for optimization features
220+
221+
## Common Issues and Solutions
222+
223+
### "ROCm not found" Error
224+
- Expected in non-GPU environments
225+
- Use containers for full GPU functionality
226+
- Python-only features still work (CLI, some validation)
227+
228+
### Network Timeout Errors
229+
- Very common with `python3 scripts/install_tool.py --all`
230+
- Expected with pre-commit installation
231+
- Document as limitation rather than trying to fix
232+
- Use containers which have dependencies pre-installed
233+
234+
### Test Failures Without GPU
235+
- Expected - most tests require GPU hardware
236+
- CI runs on actual AMD GPU droplets
237+
- Focus on code quality checks for local development
238+
239+
### Performance Validation
240+
- Always test at least one complete optimization scenario after changes
241+
- Verify JSON output contains expected performance metrics
242+
- Check that both correctness and performance validation pass
243+
244+
## CI Integration
245+
246+
The CI system (.github/workflows/ci.yml) runs comprehensive tests on AMD GPU hardware:
247+
- Spins up GPU droplets with MI300X hardware
248+
- Installs full dependency chain
249+
- Tests all optimization formulas
250+
- Validates correctness and performance improvements
251+
- NEVER CANCEL: CI can take 45+ minutes including droplet provisioning
252+
253+
Always ensure your changes pass both local code quality checks and will work in the GPU CI environment.
254+
255+
Always ensure your changes pass both local code quality checks and will work in the GPU CI environment.

0 commit comments

Comments
 (0)