You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+63-17Lines changed: 63 additions & 17 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -59,6 +59,8 @@ Please ensure your system has enough storage space before continuing.
59
59
Additional note: we provide all the sampling and code-scraping data -- therefore you can simply use a non-NVIDIA-gpu-enabled machine to run the LLM queries with our dataset.
docker run -ti --network=host --name gpu-flopbench-container --platform=linux/amd64 gpu-flopbench
80
91
81
92
docker exec -it gpu-flopbench-container /bin/bash
82
93
```
83
94
95
+
### Starting / Stopping Container
96
+
84
97
You can later start/stop the container using:
85
98
```
86
99
docker start gpu-flopbench-container
87
100
docker stop gpu-flopbench-container
88
101
```
102
+
The filechanges in the container will be persisted, unless you delete the container.
103
+
104
+
### Windows GPU-Enabled Docker Host -- Extra Steps
89
105
90
106
Note: if you're on a **Windows Docker Desktop** host, be sure to enable the following for GPU access:
91
107
```
@@ -100,6 +116,49 @@ then
100
116
restart Docker Desktop
101
117
```
102
118
119
+
# Steps to Run Scripts
120
+
121
+
Here we explain the steps used in the individual phases of data creation and collection for this work.
122
+
123
+
1)**Source Code Scraping**: In this step we perform a simple naive concatenation of all the C/C++ files for each executable in the `gpu-flopbench/src` directory.
124
+
This same set of source codes was based off the [HeCBench Suite](https://github.com/zjin-lcf/HeCBench), where we take a focus on only the CUDA programs of the suite.
125
+
These scraped codes are used as inputs to the LLMs that we later query asking them to predict the FLOP counts of the target CUDA kernels.
126
+
127
+
2)**Program Building**: We took the HeCBench programs and built them all using a custom `CMakeLists.txt` file where we had fine control over the compiler and compilation process for each program.
128
+
129
+
130
+
We use the NVIDIA GPU hardware performance counters to gather data about the number of FLOPs executed by each target kernel.
131
+
132
+
133
+
## Note on Running Scripts
134
+
135
+
Our repo already provides all the intermediate files that would be generated during the following steps:
This means that you can run our same scripts to recreate results or add to the dataset, but if your system doesn't have a compatible NVIDIA GPU, you can at the very least see all our steps in deeper detail through this Docker container.
143
+
We heavily utilize Git LFS for large data files, so you can see all our raw LLM query results through the Langgraph SQLITE logging API.
144
+
Our LLM queries and their returned responses are stored in the 'gpu-flopbench/llm-querying/checkpoints/paper-datasets/*.sqlite' files.
145
+
146
+
## Scraping Source Codes
147
+
148
+
Before we can start gathering performance counter data, we need to do a source code scrape.
149
+
This is done because once we start trying to build/run our codes, a lot of additional files will be getting unzipped that could accidentally pollute the contexts of the scraped codes.
150
+
151
+
```
152
+
## Output file already provided -- no need to run these commands
This will generate a file called `simple-scraped-kernels-CUDA-only.json`, which is the full textual representation of each program with concatenated files.
159
+
This means that CUDA kernels that come from the same parent program have the same source code that gets passed to an LLM.
160
+
161
+
103
162
## Docker Data Collection Instructions (CUDA program building & profiling)
104
163
105
164
Once you're in the main bash shell of the container, you should be by default in the `/gpu-flopbench` directory with a conda environment called `gpu-flopbench`.
@@ -123,19 +182,6 @@ We tested this on our own Docker container and had no issues, aside from timeout
123
182
124
183
125
184
126
-
## Scraping Source Codes
127
-
128
-
While you wait for the performance counter data to gather, you can start with a simple scrape of the CUDA codes.
129
-
```
130
-
## Output file already provided -- no need to run these commands
This will generate a file called `simple-scraped-kernels-CUDA-only.json`, which is the full textual representation of each program with concatenated files.
137
-
This means that CUDA kernels that come from the same parent program have the same source code that gets passed to an LLM.
0 commit comments