|
| 1 | +# BGPU: A Bad GPU |
| 2 | + |
| 3 | +## BGPU SoC |
| 4 | + |
| 5 | +This is the top most module in the BGPU-Architecture. |
| 6 | +It currently contains the following: |
| 7 | +- Control Domain: Allowing JTAG access and launching of kernels |
| 8 | +- One or more Compute Clusters |
| 9 | +- Various AXI-Interconnects to connect the Control Domain and Compute Clusters to a Memory Controller |
| 10 | + |
| 11 | +## Control Domain |
| 12 | + |
| 13 | +The Control Domain allows us to control the Compute Clusters. |
| 14 | + |
| 15 | +It contains the following modules: |
| 16 | +- RISC-V JTAG Debug Module |
| 17 | +- RISC-V RV32I Processor |
| 18 | +- Thread Engine |
| 19 | +- AXI and OBI Interconnects |
| 20 | + |
| 21 | +## Compute Clusters |
| 22 | + |
| 23 | +A Compute Cluster is composed out of one or more Compute Units. |
| 24 | + |
| 25 | +## Compute Unit |
| 26 | + |
| 27 | +The Compute Unit is the heart of the BGPU. |
| 28 | +This is the place where we actually do usefull computations (hopefully). |
| 29 | + |
| 30 | +The following diagram gives an overview of the Compute Unit: |
| 31 | + |
| 32 | +<img src="fig/compute_unit.drawio.svg"> |
| 33 | + |
| 34 | +An instruction flows through these stages: |
| 35 | +- Fetcher: Selects a PC of a Warp that should fetch new instructions |
| 36 | +- Instruction Cache: Retrieves one or more (if FetchWidth > 1) instructions at the PC |
| 37 | +- Decoder: Decodes the instructions. Tell the fetcher where the next PC will be for the Warp |
| 38 | +- Multi Warp Dispatcher: Keeps Instructions in an Wait Buffer until they are allowed to be executed. Dispatches one or more (if DispatchWidth > 1) to collect their operands |
| 39 | +- Register Operand Collector Stage: Read the Operands of the Instructions |
| 40 | +- Execution Unit Demultiplexer: Sends the Instructions to their respective Execution Unit |
| 41 | +- Branch Unit: Calculates the PC for Conditional Branches |
| 42 | +- Integer Unit: Performs integer operations and housekeeping operations (index within threadblock, get parameter address, ...) |
| 43 | +- Floating Point Unit: Performs Floating Point operations |
| 44 | +- Load Store Unit: Performs Loads and Stores to/from Memory |
| 45 | +- Result Collector: Arbitrates between Execution Unit Results and sends them to the Register File |
0 commit comments