Skip to content

Commit 3fa4b15

Browse files
docs: Add minimalistic docs
1 parent e72bd86 commit 3fa4b15

File tree

3 files changed

+602
-1
lines changed

3 files changed

+602
-1
lines changed

README.md

Lines changed: 15 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,21 @@
2727
The Architecture of BGPU is most similar to NVIDIA GPUs starting from the Fermi-Microarchitecture.
2828
We implement a form of Independent-Thread-Scheduling (ITS) similiar to the NVIDIA Volta Architecture.
2929

30-
TODO: As with all projects, documentation is still a work-in-progress...
30+
Please have a look at [`BGPU Architecture`](docs/arch.md)
31+
32+
## Quickstart
33+
34+
To run some simple tests on different levels of hierarchy use the following targets:
35+
```bash
36+
make tb_compute_unit
37+
make tb_compute_cluster
38+
make tb_bgpu_soc
39+
```
40+
41+
To see what is executed have a look at:
42+
- [`test/tb_compute_unit.sv`](test/tb_compute_unit.sv)
43+
- [`test/tb_compute_cluster.sv`](test/tb_compute_cluster.sv)
44+
- [`test/tb_bgpu_soc.sv`](test/tb_bgpu_soc.sv)
3145

3246
## Helpfull References
3347

docs/arch.md

Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,45 @@
1+
# BGPU: A Bad GPU
2+
3+
## BGPU SoC
4+
5+
This is the top most module in the BGPU-Architecture.
6+
It currently contains the following:
7+
- Control Domain: Allowing JTAG access and launching of kernels
8+
- One or more Compute Clusters
9+
- Various AXI-Interconnects to connect the Control Domain and Compute Clusters to a Memory Controller
10+
11+
## Control Domain
12+
13+
The Control Domain allows us to control the Compute Clusters.
14+
15+
It contains the following modules:
16+
- RISC-V JTAG Debug Module
17+
- RISC-V RV32I Processor
18+
- Thread Engine
19+
- AXI and OBI Interconnects
20+
21+
## Compute Clusters
22+
23+
A Compute Cluster is composed out of one or more Compute Units.
24+
25+
## Compute Unit
26+
27+
The Compute Unit is the heart of the BGPU.
28+
This is the place where we actually do usefull computations (hopefully).
29+
30+
The following diagram gives an overview of the Compute Unit:
31+
32+
<img src="fig/compute_unit.drawio.svg">
33+
34+
An instruction flows through these stages:
35+
- Fetcher: Selects a PC of a Warp that should fetch new instructions
36+
- Instruction Cache: Retrieves one or more (if FetchWidth > 1) instructions at the PC
37+
- Decoder: Decodes the instructions. Tell the fetcher where the next PC will be for the Warp
38+
- Multi Warp Dispatcher: Keeps Instructions in an Wait Buffer until they are allowed to be executed. Dispatches one or more (if DispatchWidth > 1) to collect their operands
39+
- Register Operand Collector Stage: Read the Operands of the Instructions
40+
- Execution Unit Demultiplexer: Sends the Instructions to their respective Execution Unit
41+
- Branch Unit: Calculates the PC for Conditional Branches
42+
- Integer Unit: Performs integer operations and housekeeping operations (index within threadblock, get parameter address, ...)
43+
- Floating Point Unit: Performs Floating Point operations
44+
- Load Store Unit: Performs Loads and Stores to/from Memory
45+
- Result Collector: Arbitrates between Execution Unit Results and sends them to the Register File

0 commit comments

Comments
 (0)