Skip to content

Commit 96c623c

Browse files
committed
update example structure
1 parent 81b5976 commit 96c623c

File tree

4 files changed

+51
-93
lines changed

4 files changed

+51
-93
lines changed

README.md

Lines changed: 11 additions & 93 deletions
Original file line numberDiff line numberDiff line change
@@ -50,13 +50,20 @@ The `weco` CLI leverages a tree search approach guided by Large Language Models
5050

5151
---
5252

53-
### Examples
53+
### Example: Optimizing Simple PyTorch Operations
5454

55-
**Example 1: Optimizing PyTorch simple operations**
55+
This basic example shows how to optimize a simple PyTorch function for speedup.
56+
57+
For more advanced examples, including **[Metal/MLX](/examples/metal/README.md), [Triton](/examples/triton/README.md), [CUDA kernel optimization](/examples/cuda/README.md)**, and **[ML model optimization](/examples/spaceship-titanic/README.md)t**, please see the `README.md` files within the corresponding subdirectories under the [`examples/`](./examples/) folder.
5658

5759
```bash
60+
# Navigate to the example directory
5861
cd examples/hello-kernel-world
59-
pip install torch
62+
63+
# Install dependencies
64+
pip install torch
65+
66+
# Run Weco
6067
weco --source optimize.py \
6168
--eval-command "python evaluate.py --solution-path optimize.py --device cpu" \
6269
--metric speedup \
@@ -66,96 +73,7 @@ weco --source optimize.py \
6673
--additional-instructions "Fuse operations in the forward method while ensuring the max float deviation remains small. Maintain the same format of the code."
6774
```
6875

69-
Note that if you have an NVIDIA gpu, change the device to `cuda`. If you are running this on Apple Silicon, set it to `mps`.
70-
71-
**Example 2: Optimizing MLX operations with instructions from a file**
72-
73-
Lets optimize a 2D convolution operation in [`mlx`](https://github.com/ml-explore/mlx) using [Metal](https://developer.apple.com/documentation/metal/). Sometimes, additional context or instructions are too complex for a single command-line string. You can provide a path to a file containing these instructions.
74-
75-
```bash
76-
cd examples/metal
77-
pip install mlx
78-
weco --source optimize.py \
79-
--eval-command "python evaluate.py --solution-path optimize.py" \
80-
--metric speedup \
81-
--maximize true \
82-
--steps 30 \
83-
--model gemini-2.5-pro-exp-03-25 \
84-
--additional-instructions examples.rst
85-
```
86-
87-
**Example 3: Level Agnostic Optimization: Causal Self Attention with Triton & CUDA**
88-
89-
Given how useful causal multihead self attention is to transformers, we've seen its wide adoption across ML engineering and AI research. Its great to keep things at a high-level (in PyTorch) when doing research, but when moving to production you often need to write highly customized low-level kernels to make things run as fast as they can. The `weco` CLI can optimize kernels across a variety of different abstraction levels and frameworks. Example 2 uses Metal but lets explore two more frameworks:
90-
91-
1. [Triton](https://github.com/triton-lang/triton)
92-
```bash
93-
cd examples/triton
94-
pip install torch triton
95-
weco --source optimize.py \
96-
--eval-command "python evaluate.py --solution-path optimize.py" \
97-
--metric speedup \
98-
--maximize true \
99-
--steps 30 \
100-
--model gemini-2.5-pro-exp-03-25 \
101-
--additional-instructions "Use triton to optimize the code while ensuring a small max float diff. Maintain the same code format."
102-
```
103-
104-
2. [CUDA](https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html)
105-
```bash
106-
cd examples/cuda
107-
pip install torch
108-
weco --source optimize.py \
109-
--eval-command "python evaluate.py --solution-path optimize.py" \
110-
--metric speedup \
111-
--maximize true \
112-
--steps 30 \
113-
--model gemini-2.5-pro-exp-03-25 \
114-
--additional-instructions guide.md
115-
```
116-
117-
**Example 4: Optimizing a Classification Model**
118-
119-
This example demonstrates optimizing a script for a Kaggle competition ([Spaceship Titanic](https://www.kaggle.com/competitions/spaceship-titanic/overview)) to improve classification accuracy. The additional instructions are provided via a separate file (`examples/spaceship-titanic/README.md`).
120-
121-
First, install the requirements for the example environment:
122-
```bash
123-
pip install -r examples/spaceship-titanic/requirements-test.txt
124-
```
125-
And run utility function once to prepare the dataset
126-
```bash
127-
python examples/spaceship-titanic/utils.py
128-
```
129-
130-
You should see the following structure at `examples/spaceship-titanic`. You need to prepare the kaggle credentials for downloading the dataset.
131-
```
132-
.
133-
├── baseline.py
134-
├── evaluate.py
135-
├── optimize.py
136-
├── private
137-
│ └── test.csv
138-
├── public
139-
│ ├── sample_submission.csv
140-
│ ├── test.csv
141-
│ └── train.csv
142-
├── README.md
143-
├── requirements-test.txt
144-
└── utils.py
145-
```
146-
147-
Then, execute the optimization command:
148-
```bash
149-
weco --source examples/spaceship-titanic/optimize.py \
150-
--eval-command "python examples/spaceship-titanic/optimize.py && python examples/spaceship-titanic/evaluate.py" \
151-
--metric accuracy \
152-
--maximize true \
153-
--steps 10 \
154-
--model gemini-2.5-pro-exp-03-25 \
155-
--additional-instructions examples/spaceship-titanic/README.md
156-
```
157-
158-
*The [baseline.py](examples/spaceship-titanic/baseline.py) is provided as a start point for optimization*
76+
**Note:** If you have an NVIDIA GPU, change the device in the `--eval-command` to `cuda`. If you are running this on Apple Silicon, set it to `mps`.
15977

16078
---
16179

examples/cuda/README.md

Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,40 @@
1+
# Example: Optimizing PyTorch Self-Attention with CUDA
2+
3+
This example showcases using Weco to optimize a PyTorch causal multi-head self-attention implementation by generating custom [CUDA](https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html) kernels. This approach aims for low-level optimization beyond standard PyTorch or even Triton for potentially higher performance on NVIDIA GPUs.
4+
5+
This example uses a separate Markdown file (`guide.md`) to provide detailed instructions and context to the LLM.
6+
7+
## Setup
8+
9+
1. Ensure you are in the `examples/cuda` directory.
10+
2. Install the required dependency:
11+
```bash
12+
pip install torch
13+
```
14+
*(Note: This example requires a compatible NVIDIA GPU and the CUDA Toolkit installed on your system for compiling and running the generated CUDA code.)*
15+
16+
## Optimization Command
17+
18+
Run the following command to start the optimization process:
19+
20+
```bash
21+
weco --source optimize.py \
22+
--eval-command "python evaluate.py --solution-path optimize.py" \
23+
--metric speedup \
24+
--maximize true \
25+
--steps 30 \
26+
--model gemini-2.5-pro-exp-03-25 \
27+
--additional-instructions guide.md
28+
```
29+
30+
### Explanation
31+
32+
* `--source optimize.py`: The initial PyTorch self-attention code to be optimized with CUDA.
33+
* `--eval-command "python evaluate.py --solution-path optimize.py"`: Runs the evaluation script, which compiles (if necessary) and benchmarks the CUDA-enhanced code in `optimize.py` against a baseline, printing the `speedup`.
34+
* `--metric speedup`: The optimization target metric.
35+
* `--maximize true`: Weco aims to increase the speedup.
36+
* `--steps 30`: The number of optimization iterations.
37+
* `--model gemini-2.5-pro-exp-03-25`: The LLM used for code generation.
38+
* `--additional-instructions guide.md`: Points Weco to a file containing detailed instructions for the LLM on how to write the CUDA kernels, handle compilation (e.g., using `torch.utils.cpp_extension`), manage data types, and ensure correctness.
39+
40+
Weco will iteratively modify `optimize.py`, potentially generating and integrating CUDA C++ code, guided by the evaluation results and the instructions in `guide.md`.

examples/metal/README.md

Whitespace-only changes.

examples/triton/README.md

Whitespace-only changes.

0 commit comments

Comments
 (0)