Skip to content

Commit ce9015a

Browse files
committed
design: describe the dice bytecode interpreter
Updates #1.
1 parent d6d12b1 commit ce9015a

File tree

1 file changed

+336
-0
lines changed

1 file changed

+336
-0
lines changed

design/1-bytecode-interpreter.md

Lines changed: 336 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,336 @@
1+
# Proposal: Design of a bytecode interpreter for Go
2+
3+
Author: Sebastien Binet
4+
5+
Last updated: 2016-08-26
6+
7+
Discussion at https://github.com/go-interpreter/proposal/issue/1.
8+
9+
## Abstract
10+
11+
We propose to design and implement a bytecode interpreter for Go,
12+
which will be the foundation for a Go [REPL](https://en.wikipedia.org/wiki/Read%E2%80%93eval%E2%80%93print_loop).
13+
14+
## Background
15+
16+
It is common in science or exploratory work to iterate on a piece of code
17+
to solve a given problem.
18+
Having an interactive conversation with your program, _via_ an interactive
19+
prompt (aka a REPL), greatly speeds up such exploratory work: one can easily
20+
iterate on various algorithms, modifying the state of your program and data,
21+
and write new types and functions to _e.g._ plot the new state of your data.
22+
23+
A side benefit of such an interpreter is the ability to embed it inside
24+
a Go application and provide both scriptability and extensibility.
25+
Designing such an API is outside the perimeter of this proposal.
26+
27+
There are currently already partial solutions or whole implementations
28+
of a Go REPL on the market but none of those meets the following requirements:
29+
30+
- easy `go get` installation
31+
- implement the whole Go language
32+
- be a real REPL, not just an "on-the-fly re-compilation + re-run the whole snippet" approach
33+
- JIT-able
34+
- performant
35+
36+
## Proposal
37+
38+
We propose to break the complicated issue of bringing a complete interpreter
39+
for Go (interactivity, whole-program interpretation, runtime, native functions,
40+
external functions, JITing, parsing source code, ...) into small pieces.
41+
42+
The current proposal only deals with describing the bytecode interpreter
43+
(its overall design and its components), the opcodes and instructions which
44+
can be found in a bytecode stream and how these bytecodes can be interpreted and
45+
acted upon by the interpreter.
46+
47+
There are many ways to implement an interpreter and as many options
48+
for the interpretation process:
49+
50+
1. directly interpret from the source code
51+
2. interpret the source code after it has been transformed into an AST
52+
3. compile statements into bytecode instructions that are then executed
53+
54+
We propose to go with option 3).
55+
Option 1) doesn't lend itself to optimizations nor very efficient execution.
56+
Option 2) is somewhat better: there are ways to programmatically manipulate
57+
and transform an AST.
58+
But with option 3) we should be able to reuse the whole corpus of optimizations
59+
coming from the new SSA backend of the official `gc` Go compiler.
60+
As explained in Rob Pike's talk at GopherCon-2016: ["The Design of the Go Assembler"](https://talks.golang.org/2016/asm.slide),
61+
the `cmd/internal/obj` package can be seen as a rather portable assembly language.
62+
This paves the way for considering it as a portable intermediate representation
63+
(IR) of Go code.
64+
65+
The proposal is thus to use this conduit as the general infrastructure to
66+
generate the opcodes and bytecode for the new Go VM.
67+
The concrete _modus_ _operandi_ for leveraging `cmd/internal/obj` and
68+
the whole `gc` compiler infrastructure might still need to be properly fleshed
69+
out, but here are the current options:
70+
71+
- create a proper `GOARCH` architecture directly under `cmd/internal` like
72+
the other `GOARCH=amd64`, `GOARCH=s390x`, etc... architectures and aim for
73+
Go 1.8, (we would need to declare our plans [here](https://groups.google.com/forum/#!topic/golang-dev/098vr4999Tk))
74+
- vendor `cmd/compiler` at a given Go version (_e.g._ 1.7) and work off it,
75+
aiming for integration at a later date (if at all possible),
76+
- ???
77+
78+
### Instructions, opcodes and bytecode format
79+
80+
We propose to reuse the opcodes and bytecode format as described in the [Dis VM](http://www.vitanuova.com/inferno/papers/dis.pdf)
81+
specification paper.
82+
The `Dis` VM was able to execute [Limbo](https://en.wikipedia.org/wiki/Limbo_%28programming_language%29)
83+
code.
84+
`Limbo` and `Go` share a common lineage and present similar features
85+
(channels, `select`, garbage collector, packages) so many (if not all) of
86+
the opcodes our VM will need are already present and the instruction set has
87+
been formally described.
88+
The on-disk object file format and overall organization has also been specified
89+
in the above paper.
90+
91+
We intend to follow the general spirit of the specifications of the `Dis` VM
92+
and condense it inside a package named `dice`.
93+
The implementation of `dice` should be done from first principles,
94+
without looking at the `Dis` source code
95+
This is to ensure that `dice` can be licensed under `BSD-3`.
96+
97+
The various `opcode`s are listed here:
98+
99+
```
100+
00 nop 20 headb 40 mulw 60 blew 80 shrl
101+
01 alt 21 headw 41 mulf 61 bgtw 81 bnel
102+
02 nbalt 22 headp 42 divb 62 bgew 82 bltl
103+
03 goto 23 headf 43 divw 63 beqf 83 blel
104+
04 call 24 headm 44 divf 64 bnef 84 bgtl
105+
05 frame 25 headmp 45 modw 65 bltf 85 bgel
106+
06 spawn 26 tail 46 modb 66 blef 86 beql
107+
07 runt 27 lea 47 andb 67 bgtf 87 cvtlf
108+
08 load 28 indx 48 andw 68 bgef 88 cvtfl
109+
09 mcall 29 movp 49 orb 69 beqc 89 cvtlw
110+
0A mspawn 2A movm 4A orw 6A bnec 8A cvtwl
111+
0B mframe 2B movmp 4B xorb 6B bltc 8B cvtlc
112+
0C ret 2C movb 4C xorw 6C blec 8C cvtcl
113+
0D jmp 2D movw 4D shlb 6D bgtc 8D headl
114+
0E case 2E movf 4E shlw 6E bgec 8E consl
115+
0F exit 2F cvtbw 4F shrb 6F slicea 8F newcl
116+
10 new 30 cvtwb 50 shrw 70 slicela 90 casec
117+
11 newa 31 cvtfw 51 insc 71 slicec 91 indl
118+
12 newcb 32 cvtwf 52 indc 72 indw 92 movpc
119+
13 newcw 33 cvtca 53 addc 73 indf 93 tcmp
120+
14 newcf 34 cvtac 54 lenc 74 indb 94 mnewz
121+
15 newcp 35 cvtwc 55 lena 75 negf 95 cvtrf
122+
16 newcm 36 cvtcw 56 lenl 76 movl 96 cvtfr
123+
17 newcmp 37 cvtfc 57 beqb 77 addl 97 cvtws
124+
18 send 38 cvtcf 58 bneb 78 subl 98 cvtsw
125+
19 recv 39 addb 59 bltb 79 divl 99 lsrw
126+
1A consb 3A addw 5A bleb 7A modl 9A lsrl
127+
1B consw 3B addf 5B bgtb 7B mull 9B eclr
128+
1C consp 3C subb 5C bgeb 7C andl 9C newz
129+
1D consf 3D subw 5D beqw 7D orl 9D newaz
130+
1E consm 3E subf 5E bnew 7E xorl
131+
1F consmp 3F mulb 5F bltw 7F shll
132+
```
133+
134+
We reserve the right to rename some of these `opcode`s to better reflect
135+
the naming conventions of our source language, Go.
136+
137+
### Virtual Machine
138+
139+
Once a Go package, command or code snippet has been compiled to our `dice` bytecode,
140+
that bytecode needs to be somehow executed.
141+
This job is performed by the `dice.VM` virtual machine:
142+
143+
```go
144+
package dice
145+
146+
type VM struct {
147+
frame *frame
148+
globals []reflect.Value
149+
}
150+
151+
type frame struct {
152+
vm *VM
153+
caller *frame
154+
locals []reflect.Value
155+
pc int // program counter
156+
code []instruction
157+
}
158+
159+
type instruction struct {
160+
opcode byte
161+
amode byte // address mode
162+
addrs uint64 // operands (src1, src2, dst)
163+
}
164+
165+
func (vm *VM) run() {
166+
run(vm.frame)
167+
}
168+
169+
func run(fr *frame) {
170+
for {
171+
code:
172+
for _, code := range fr.code {
173+
switch exec(fr, code) {
174+
case cfReturn:
175+
return
176+
case cfNext:
177+
// fetching next instruction
178+
case cfJump:
179+
break code
180+
}
181+
}
182+
}
183+
}
184+
185+
func exec(fr *frame, code instruction) cfKind {
186+
switch code.opcode {
187+
case opADDF:
188+
// dst = src1 + src2
189+
fr.pc++
190+
case opCALL:
191+
run(&frame{caller:fr, pc:0, code: from(src)})
192+
case opRET:
193+
// fetch result if any
194+
return cfReturn
195+
case opGO:
196+
go func() {
197+
run(&frame{caller:fr})
198+
}()
199+
// etc...
200+
}
201+
}
202+
```
203+
204+
At this moment, the proposal is to be able to byte compile this simple Go package:
205+
206+
```go
207+
package main
208+
209+
func add(i, j int) int {
210+
return i+j
211+
}
212+
213+
func main() {}
214+
```
215+
216+
and in a later stage, be able to run `add(40, 2)`.
217+
218+
## Rationale
219+
220+
Why do we implement yet another Go interpreter and a REPL?
221+
Aren't there already enough of them?
222+
223+
Here is a list of alternatives:
224+
225+
- [llgoi](https://github.com/llvm-mirror/llgo/blob/master/cmd/llgoi/llgoi.go) is a JIT-enabled interpreter built on top of `LLVM` and `llgo`.
226+
The first issue with `llgoi` is the somewhat painfull installation process.
227+
This pain point should be resorbed with time (and also by providing [snap based](https://groups.google.com/forum/#!msg/llgo-dev/ny8MgDlNkng/8kEvgzfuCQAJ)
228+
isntallations of `llgoi`.
229+
But the main issue is that `llgo` development is behind that of the reference
230+
implementation of `Go`: `gc`.
231+
Also, the pace of development of `LLVM` itself (very fast) and the version skew
232+
that may result on users' machines *might* set the scene for difficult user
233+
support and debugging sessions.
234+
235+
- [ssainterp](https://github.com/go-interpreter/ssainterp) and [ssadump -run](https://godoc.org/golang.org/x/tools/cmd/ssadump)
236+
are based on the SSA suite developped at [golang.org/x/tools/go/ssa](https://godoc.org/golang.org/x/tools/go/ssa).
237+
They are able to parse and interpret a vast majority of valid Go code,
238+
but lack an interactive interpreter mode.
239+
`ssadump` code is also clearly stated as *NOT* meant to be used as a
240+
production-grade interpreter for Go but merely as an adjunct for testing
241+
the SSA construction algorithm.
242+
243+
- [igo](https://github.com/sbinet/igo) and [go-eval](https://github.com/sbinet/go-eval)
244+
are projects salvaged from the pre `Go-1` era.
245+
`go-eval` does not lend itself easily to compilation optimizations and lacks
246+
support for `imports`, `goroutines`, type creation, ...
247+
248+
- [gore](https://github.com/motemen/gore) supports the whole Go language but
249+
does not (completely cleanly) preserve state or side effects between
250+
2 interactive commands: `gore` recompiles on-the-fly your Go snippets and
251+
re-executes them.
252+
253+
It seems necessary to implement some kind of a virtual machine to be able
254+
to provide an efficient and truly interactive interpreter for Go.
255+
256+
The same question can be also raised about reimplementing a whole new VM.
257+
Couldn't we have somehow reused an already existing VM?
258+
`Python`, `Lua`, `JVM` and `Dis` come to mind.
259+
`Dis` is LGPL and thus not easily integrable in the usual Go ecosystem.
260+
`Python` and `Lua` have more permissive licenses, but their reference
261+
implementation are written in `C`, bringing either performance issues on the
262+
table (`cgo`) or throwing `go-get`-ability out of the window.
263+
There are however `Go` implementations (partial or complete) of these VMs:
264+
265+
- https://github.com/Shopify/go-lua/blob/master/vm.go
266+
- https://github.com/flowlo/gothon/blob/master/frame.go
267+
268+
The following issue at this point is the adequacy of their respective VM
269+
instructions sets with the Go language.
270+
271+
Finally, why do we use the `Dis` VM instructions set, instead of a more recent
272+
or more in vogue set, such as [LLVM bitcode](http://llvm.org/docs/BitCodeFormat.html)
273+
and its associated [LLVM assembly](http://llvm.org/docs/LangRef.html), or the
274+
nascent [`wasm` bytecode](https://webassembly.github.io/) format?
275+
276+
The `LLVM` solution suffers (to a lesser extent) from the same issues than the `llgoi` approach.
277+
We should note though there exists a pure-Go project to interact with the `LLVM` `IR`:
278+
[llir/llvm](https://github.com/llir/llvm).
279+
This project is still a work in progress at this time of writing (August 2016).
280+
281+
`wasm` is probably a very strong and sensible option, and poised to take over
282+
the whole web industry.
283+
Unfortunately, there is only a work in progress `C/C++` project at the moment (August 2016),
284+
so it is probably a bit early to write code to target it.
285+
However, `wasm` is definitely a backend to monitor: `gopherjs`, a project transpiling
286+
Go code into `JavaScript` will probably target it at some point.
287+
288+
## Compatibility - Open issues
289+
290+
There are a few interesting issues when interpreting Go code in an interactive
291+
fashion.
292+
293+
1. Should we allow mid-way imports of packages ?
294+
```
295+
go> slice := []string{"HELLO", "GO"}
296+
go> import "strings"
297+
go> println(strings.ToLower(slice[0]))
298+
```
299+
300+
What if `slice` was instead named `strings`?
301+
Should we allow shadowing of variables by package identifiers?
302+
Should we instead re-shadow the package identifier with the variable
303+
identifier?
304+
The latter seems like the more idiomatic Go behaviour, or at least the
305+
behaviour a gopher would expect if she were to write the program in
306+
a compiled environment (_i.e.:_ with `goimports` putting the `import`
307+
statement at the top)
308+
309+
2. Support for `cgo` and `import "C"` ?
310+
3. Support for packages with assembly ? (from the `stdlib` or otherwise)
311+
4. Calls to `syscalls` ? Should they be somehow recognized and performed
312+
on a dedicated `goroutine`? What should `os.Exit` do? and how?
313+
5. How to efficiently implement iteration over maps?
314+
6. How to implement `unsafe`? Should we?
315+
7. How to implement the definition of new types?
316+
Package `reflect` has some support for this (`StructOf`, `ArrayOf`, ...) but
317+
it currently has no support for defining new interface types nor any new
318+
named types.
319+
8. In an interactive interpreter, how do we define methods for a named type?
320+
When, and how, do we tell the interpreter that the method set of a given
321+
named type is done?
322+
9. What is the most efficient way to write the `opcode` dispatch loop?
323+
A huge switch? ([go-lua](https://engineering.shopify.com/79963844-announcing-go-lua)
324+
reported issues with huge switches and migrated to a jump table.)
325+
326+
## Implementation
327+
328+
1. `dice.{VM,frame,instruction}` implementation leading to the execution
329+
of already decoded instructions,
330+
2. implementation of the bytecode stream decoder,
331+
3. implementation of the bytecode encoder,
332+
4. implementation of the interactive prompt of the REPL (with limitations),
333+
5. implementation of dynamically importing packages at the REPL level.
334+
This probably needs either a working `buildmode=plugin` from the `go` tool,
335+
or a complete handling of dynamically loading bytecode object files.
336+

0 commit comments

Comments
 (0)