Skip to content

Commit 8297d0a

Browse files
committed
updating readme
1 parent 8832988 commit 8297d0a

File tree

2 files changed

+63
-17
lines changed

2 files changed

+63
-17
lines changed

README.md

Lines changed: 63 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -59,6 +59,8 @@ Please ensure your system has enough storage space before continuing.
5959
Additional note: we provide all the sampling and code-scraping data -- therefore you can simply use a non-NVIDIA-gpu-enabled machine to run the LLM queries with our dataset.
6060
‼️‼️
6161

62+
### Container on NVIDIA GPU-Enabled Host
63+
6264
```
6365
git clone https://github.com/Scientific-Computing-Lab/gpuFLOPBench.git ./gpu-flopbench
6466
@@ -68,24 +70,38 @@ cd ./gpu-flopbench
6870
# this takes about 10-20 minutes
6971
docker build --progress=plain -t 'gpu-flopbench' .
7072
71-
## Alternative docker build if host machine is Apple Silicon (M1/2/3/4)
72-
docker build --platform=linux/amd64 --progress=plain -t 'gpu-flopbench' .
73-
7473
# please make sure that the Settings > Resources > Network > 'Enable Host Networking' option is enabled on Docker Desktop
7574
# this is so you can run and view Jupyter Notebooks
7675
docker run -ti --network=host --gpus all --name gpu-flopbench-container --runtime=nvidia -e NVIDIA_DRIVER_CAPABILITIES=compute,utility -e NVIDIA_VISIBLE_DEVICES=all gpu-flopbench
7776
78-
## Alternative docker run if host machine is Apple Silicon (M1/2/3/4)
77+
docker exec -it gpu-flopbench-container /bin/bash
78+
```
79+
80+
### Container on Macbook (Apple Silicon M1/2/3/4) -- no NVIDIA GPU
81+
```
82+
git clone https://github.com/Scientific-Computing-Lab/gpuFLOPBench.git ./gpu-flopbench
83+
84+
# we only really need the Dockerfile from the repo
85+
cd ./gpu-flopbench
86+
87+
# this takes about 2 minutes on my Macbook Air M4
88+
docker build --platform=linux/amd64 --progress=plain -t 'gpu-flopbench' .
89+
7990
docker run -ti --network=host --name gpu-flopbench-container --platform=linux/amd64 gpu-flopbench
8091
8192
docker exec -it gpu-flopbench-container /bin/bash
8293
```
8394

95+
### Starting / Stopping Container
96+
8497
You can later start/stop the container using:
8598
```
8699
docker start gpu-flopbench-container
87100
docker stop gpu-flopbench-container
88101
```
102+
The filechanges in the container will be persisted, unless you delete the container.
103+
104+
### Windows GPU-Enabled Docker Host -- Extra Steps
89105

90106
Note: if you're on a **Windows Docker Desktop** host, be sure to enable the following for GPU access:
91107
```
@@ -100,6 +116,49 @@ then
100116
restart Docker Desktop
101117
```
102118

119+
# Steps to Run Scripts
120+
121+
Here we explain the steps used in the individual phases of data creation and collection for this work.
122+
123+
1) **Source Code Scraping**: In this step we perform a simple naive concatenation of all the C/C++ files for each executable in the `gpu-flopbench/src` directory.
124+
This same set of source codes was based off the [HeCBench Suite](https://github.com/zjin-lcf/HeCBench), where we take a focus on only the CUDA programs of the suite.
125+
These scraped codes are used as inputs to the LLMs that we later query asking them to predict the FLOP counts of the target CUDA kernels.
126+
127+
2) **Program Building**: We took the HeCBench programs and built them all using a custom `CMakeLists.txt` file where we had fine control over the compiler and compilation process for each program.
128+
129+
130+
We use the NVIDIA GPU hardware performance counters to gather data about the number of FLOPs executed by each target kernel.
131+
132+
133+
## Note on Running Scripts
134+
135+
Our repo already provides all the intermediate files that would be generated during the following steps:
136+
137+
1) Source Code Scraping
138+
2) Executable Building, Kernel Extraction, Kernel Profiling
139+
3) Dataset Creation
140+
4) LLM Querying
141+
142+
This means that you can run our same scripts to recreate results or add to the dataset, but if your system doesn't have a compatible NVIDIA GPU, you can at the very least see all our steps in deeper detail through this Docker container.
143+
We heavily utilize Git LFS for large data files, so you can see all our raw LLM query results through the Langgraph SQLITE logging API.
144+
Our LLM queries and their returned responses are stored in the 'gpu-flopbench/llm-querying/checkpoints/paper-datasets/*.sqlite' files.
145+
146+
## Scraping Source Codes
147+
148+
Before we can start gathering performance counter data, we need to do a source code scrape.
149+
This is done because once we start trying to build/run our codes, a lot of additional files will be getting unzipped that could accidentally pollute the contexts of the scraped codes.
150+
151+
```
152+
## Output file already provided -- no need to run these commands
153+
154+
cd $GPU_FLOPBENCH_ROOT/source-scraping
155+
156+
python3 simpleScrapeKernels.py --skipSASS --cudaOnly --outfile="simple-scraped-kernels-CUDA-only.json"
157+
```
158+
This will generate a file called `simple-scraped-kernels-CUDA-only.json`, which is the full textual representation of each program with concatenated files.
159+
This means that CUDA kernels that come from the same parent program have the same source code that gets passed to an LLM.
160+
161+
103162
## Docker Data Collection Instructions (CUDA program building & profiling)
104163

105164
Once you're in the main bash shell of the container, you should be by default in the `/gpu-flopbench` directory with a conda environment called `gpu-flopbench`.
@@ -123,19 +182,6 @@ We tested this on our own Docker container and had no issues, aside from timeout
123182

124183

125184

126-
## Scraping Source Codes
127-
128-
While you wait for the performance counter data to gather, you can start with a simple scrape of the CUDA codes.
129-
```
130-
## Output file already provided -- no need to run these commands
131-
132-
cd $GPU_FLOPBENCH_ROOT/source-scraping
133-
134-
python3 simpleScrapeKernels.py --skipSASS --cudaOnly --outfile="simple-scraped-kernels-CUDA-only.json"
135-
```
136-
This will generate a file called `simple-scraped-kernels-CUDA-only.json`, which is the full textual representation of each program with concatenated files.
137-
This means that CUDA kernels that come from the same parent program have the same source code that gets passed to an LLM.
138-
139185

140186

141187

llm-querying/visualizeMispredCaseAnalysisRseults.ipynb renamed to llm-querying/visualizeMispredCaseAnalysisResults.ipynb

File renamed without changes.

0 commit comments

Comments
 (0)