Skip to content

Commit e2628ea

Browse files
Update README
1 parent c1aa535 commit e2628ea

File tree

1 file changed

+70
-14
lines changed

1 file changed

+70
-14
lines changed

README.md

Lines changed: 70 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -1,18 +1,74 @@
1-
# chisel-empty
1+
# Timy GPU
2+
The goal of this Timy GPU project is to develop a miniature gpu capable of parallel processing. Specifically the ability to execute programs similar to "compute shaders". The project will be written in the chisel HDL with the goal of testing the design on a physical FPGA.
3+
## Design
4+
The GPU will consist of memory, a single core with many thread, and logic to load programs into memory and dispatch programs to execute.
5+
### Memory
6+
A single memory block will be shared between cores. A memory controller will manage read and write requests from the multiple cores at once. Memory will have 16 bits of addressable space. Memory will include both program data and application data. There will also be a stack with a size of 16 bits.
7+
### GPU State
8+
The GPU can either be idle, program load, or execute states. During program load state, the gpu will read in data and address location from external wires and load the data into memory at the addresses.
9+
```cs
10+
in byte state // idle | program load | execute
211
3-
An almost empty chisel project (and adder) as a starting point for hardware design.
12+
out bool writeReady
13+
in byte writeData
14+
in byte writeAddress
15+
in bool write
416

5-
To generate Verilog code for the adder execute:
6-
```bash
7-
make
8-
```
9-
10-
Run the tests with:
11-
```bash
12-
make test
13-
```
17+
out bool readReady
18+
out byte readData
19+
in byte readAddress
20+
in bool read
1421

15-
Cleanup the repository with:
16-
```bash
17-
make clean
22+
in byte startPointer
1823
```
24+
### Core
25+
Theoretically the GPU could be expanded to allow the execution of multiple programs at once by adding more cores, however for simplicity's sake I'm aiming to only have a single core for the moment. Although once we get a single core working I don't imagine supporting more cores to be very difficult. A core will consist of a memory access, dispatcher / synchronizer, and a number of threads. The dispatcher / synchronizer manages loading the program from memory and wiring this to threads for threads to execute. (This is already somewhat working as of 10-16).
26+
### Thread
27+
A single thread contains a program counter, ALU, LSU, and registers. Thread take in loaded operations from their parent core and execute the operation. Once more development is done with register's, I'll have a clearer idea of how many / which registers a thread needs, but I'm imagining initially we'll have a few 16 bit registers:
28+
1. stack register
29+
2. a, b, and c register
30+
### Instruction Set
31+
The instruction set is 24 bits wide. 8 for opcode and 16 additional bits for immediate. The first 5 opcode bits specify instruction. The next 3 specify target or source / destination registers if an instruction uses it.
32+
- move
33+
`00001` + target --> moves immediate into register
34+
`00010` + src/dst --> moves value in register to register
35+
- load
36+
`00011` + src/dst --> takes address from register and loads memory into other register
37+
- add
38+
`00100` + src/dst --> add value in src to dst and store in dst
39+
- mul
40+
`00101` + src/dst --> multiplies value in src by dst and store in dst
41+
- cmp
42+
`00110` + src/dst --> compares value in src to dst and stores result in nzp flag of alu
43+
- jmp
44+
`00111` + target --> jumps program pointer to value specified in register
45+
`01000` + target --> jumps program pointer to value specified in register if negative flag is set
46+
`01001` + target --> jumps program pointer to value specified in register if positive flag is set
47+
`01010` + target --> jumps program pointer to value specified in register if zero flag is set
48+
`01011` + target --> jumps program pointer to value specified in register if not zero flag is set
49+
- or
50+
`01100` + src/dst --> does bitwise or of src and dst registers and stores result in dst
51+
- and
52+
`01101` + src/dst --> does bitwise and of src and dst registers and stores result in dst
53+
- xor
54+
`01110` + src/dst --> does bitwise xor of src and dst registers and stores result in dst
55+
- not
56+
`01111` + src/dst --> does bitwise not of src and dst registers and stores result in dst
57+
- shift R
58+
`10000` + target --> shifts bits to the right and pads 0s at beginning of target register
59+
- shift L
60+
`10001` + target --> shifts bits to the left and pads 0s at end of target register
61+
- push
62+
`10010` + target --> pushes value in target register to stack
63+
- pop
64+
`10011` + target --> pops value on top of stack into register
65+
- sync (experimental?)
66+
`10100` + target --> tells the core dispatcher to not dispatch any threads until all threads have reached the program pointer specified in the specified register
67+
- term
68+
`10101` --> signals that the thread has finished execution
69+
- store
70+
`10110` + src/dst --> takes address from src register and stores the value in dst register into memory
71+
72+
Potentially we may need more instructions but I can't think of any more that we need right now?
73+
# How To Run
74+
`sbt run test`

0 commit comments

Comments
 (0)