|
| 1 | +# Proposal: Design of a bytecode interpreter for Go |
| 2 | + |
| 3 | +Author: Sebastien Binet |
| 4 | + |
| 5 | +Last updated: 2016-08-26 |
| 6 | + |
| 7 | +Discussion at https://github.com/go-interpreter/proposal/issue/1. |
| 8 | + |
| 9 | +## Abstract |
| 10 | + |
| 11 | +We propose to design and implement a bytecode interpreter for Go, |
| 12 | +which will be the foundation for a Go [REPL](https://en.wikipedia.org/wiki/Read%E2%80%93eval%E2%80%93print_loop). |
| 13 | + |
| 14 | +## Background |
| 15 | + |
| 16 | +It is common in science or exploratory work to iterate on a piece of code |
| 17 | +to solve a given problem. |
| 18 | +Having an interactive conversation with your program, _via_ an interactive |
| 19 | +prompt (aka a REPL), greatly speeds up such exploratory work: one can easily |
| 20 | +iterate on various algorithms, modifying the state of your program and data, |
| 21 | +and write new types and functions to _e.g._ plot the new state of your data. |
| 22 | + |
| 23 | +A side benefit of such an interpreter is the ability to embed it inside |
| 24 | +a Go application and provide both scriptability and extensibility. |
| 25 | +Designing such an API is outside the perimeter of this proposal. |
| 26 | + |
| 27 | +There are currently already partial solutions or whole implementations |
| 28 | +of a Go REPL on the market but none of those meets the following requirements: |
| 29 | + |
| 30 | +- easy `go get` installation |
| 31 | +- implement the whole Go language |
| 32 | +- be a real REPL, not just an "on-the-fly re-compilation + re-run the whole snippet" approach |
| 33 | +- JIT-able |
| 34 | +- performant |
| 35 | + |
| 36 | +## Proposal |
| 37 | + |
| 38 | +We propose to break the complicated issue of bringing a complete interpreter |
| 39 | +for Go (interactivity, whole-program interpretation, runtime, native functions, |
| 40 | +external functions, JITing, parsing source code, ...) into small pieces. |
| 41 | + |
| 42 | +The current proposal only deals with describing the bytecode interpreter |
| 43 | +(its overall design and its components), the opcodes and instructions which |
| 44 | +can be found in a bytecode stream and how these bytecodes can be interpreted and |
| 45 | +acted upon by the interpreter. |
| 46 | + |
| 47 | +There are many ways to implement an interpreter and as many options |
| 48 | +for the interpretation process: |
| 49 | + |
| 50 | +1. directly interpret from the source code |
| 51 | +2. interpret the source code after it has been transformed into an AST |
| 52 | +3. compile statements into bytecode instructions that are then executed |
| 53 | + |
| 54 | +We propose to go with option 3). |
| 55 | +Option 1) doesn't lend itself to optimizations nor very efficient execution. |
| 56 | +Option 2) is somewhat better: there are ways to programmatically manipulate |
| 57 | +and transform an AST. |
| 58 | +But with option 3) we should be able to reuse the whole corpus of optimizations |
| 59 | +coming from the new SSA backend of the official `gc` Go compiler. |
| 60 | +As explained in Rob Pike's talk at GopherCon-2016: ["The Design of the Go Assembler"](https://talks.golang.org/2016/asm.slide), |
| 61 | +the `cmd/internal/obj` package can be seen as a rather portable assembly language. |
| 62 | +This paves the way for considering it as a portable intermediate representation |
| 63 | +(IR) of Go code. |
| 64 | + |
| 65 | +The proposal is thus to use this conduit as the general infrastructure to |
| 66 | +generate the opcodes and bytecode for the new Go VM. |
| 67 | +The concrete _modus_ _operandi_ for leveraging `cmd/internal/obj` and |
| 68 | +the whole `gc` compiler infrastructure might still need to be properly fleshed |
| 69 | +out, but here are the current options: |
| 70 | + |
| 71 | +- create a proper `GOARCH` architecture directly under `cmd/internal` like |
| 72 | + the other `GOARCH=amd64`, `GOARCH=s390x`, etc... architectures and aim for |
| 73 | + Go 1.8, (we would need to declare our plans [here](https://groups.google.com/forum/#!topic/golang-dev/098vr4999Tk)) |
| 74 | +- vendor `cmd/compiler` at a given Go version (_e.g._ 1.7) and work off it, |
| 75 | + aiming for integration at a later date (if at all possible), |
| 76 | +- ??? |
| 77 | + |
| 78 | +### Instructions, opcodes and bytecode format |
| 79 | + |
| 80 | +We propose to reuse the opcodes and bytecode format as described in the [Dis VM](http://www.vitanuova.com/inferno/papers/dis.pdf) |
| 81 | +specification paper. |
| 82 | +The `Dis` VM was able to execute [Limbo](https://en.wikipedia.org/wiki/Limbo_%28programming_language%29) |
| 83 | +code. |
| 84 | +`Limbo` and `Go` share a common lineage and present similar features |
| 85 | +(channels, `select`, garbage collector, packages) so many (if not all) of |
| 86 | +the opcodes our VM will need are already present and the instruction set has |
| 87 | +been formally described. |
| 88 | +The on-disk object file format and overall organization has also been specified |
| 89 | +in the above paper. |
| 90 | + |
| 91 | +We intend to follow the general spirit of the specifications of the `Dis` VM |
| 92 | +and condense it inside a package named `dice`. |
| 93 | +The implementation of `dice` should be done from first principles, |
| 94 | +without looking at the `Dis` source code |
| 95 | +This is to ensure that `dice` can be licensed under `BSD-3`. |
| 96 | + |
| 97 | +The various `opcode`s are listed here: |
| 98 | + |
| 99 | +``` |
| 100 | +00 nop 20 headb 40 mulw 60 blew 80 shrl |
| 101 | +01 alt 21 headw 41 mulf 61 bgtw 81 bnel |
| 102 | +02 nbalt 22 headp 42 divb 62 bgew 82 bltl |
| 103 | +03 goto 23 headf 43 divw 63 beqf 83 blel |
| 104 | +04 call 24 headm 44 divf 64 bnef 84 bgtl |
| 105 | +05 frame 25 headmp 45 modw 65 bltf 85 bgel |
| 106 | +06 spawn 26 tail 46 modb 66 blef 86 beql |
| 107 | +07 runt 27 lea 47 andb 67 bgtf 87 cvtlf |
| 108 | +08 load 28 indx 48 andw 68 bgef 88 cvtfl |
| 109 | +09 mcall 29 movp 49 orb 69 beqc 89 cvtlw |
| 110 | +0A mspawn 2A movm 4A orw 6A bnec 8A cvtwl |
| 111 | +0B mframe 2B movmp 4B xorb 6B bltc 8B cvtlc |
| 112 | +0C ret 2C movb 4C xorw 6C blec 8C cvtcl |
| 113 | +0D jmp 2D movw 4D shlb 6D bgtc 8D headl |
| 114 | +0E case 2E movf 4E shlw 6E bgec 8E consl |
| 115 | +0F exit 2F cvtbw 4F shrb 6F slicea 8F newcl |
| 116 | +10 new 30 cvtwb 50 shrw 70 slicela 90 casec |
| 117 | +11 newa 31 cvtfw 51 insc 71 slicec 91 indl |
| 118 | +12 newcb 32 cvtwf 52 indc 72 indw 92 movpc |
| 119 | +13 newcw 33 cvtca 53 addc 73 indf 93 tcmp |
| 120 | +14 newcf 34 cvtac 54 lenc 74 indb 94 mnewz |
| 121 | +15 newcp 35 cvtwc 55 lena 75 negf 95 cvtrf |
| 122 | +16 newcm 36 cvtcw 56 lenl 76 movl 96 cvtfr |
| 123 | +17 newcmp 37 cvtfc 57 beqb 77 addl 97 cvtws |
| 124 | +18 send 38 cvtcf 58 bneb 78 subl 98 cvtsw |
| 125 | +19 recv 39 addb 59 bltb 79 divl 99 lsrw |
| 126 | +1A consb 3A addw 5A bleb 7A modl 9A lsrl |
| 127 | +1B consw 3B addf 5B bgtb 7B mull 9B eclr |
| 128 | +1C consp 3C subb 5C bgeb 7C andl 9C newz |
| 129 | +1D consf 3D subw 5D beqw 7D orl 9D newaz |
| 130 | +1E consm 3E subf 5E bnew 7E xorl |
| 131 | +1F consmp 3F mulb 5F bltw 7F shll |
| 132 | +``` |
| 133 | + |
| 134 | +We reserve the right to rename some of these `opcode`s to better reflect |
| 135 | +the naming conventions of our source language, Go. |
| 136 | + |
| 137 | +### Virtual Machine |
| 138 | + |
| 139 | +Once a Go package, command or code snippet has been compiled to our `dice` bytecode, |
| 140 | +that bytecode needs to be somehow executed. |
| 141 | +This job is performed by the `dice.VM` virtual machine: |
| 142 | + |
| 143 | +```go |
| 144 | +package dice |
| 145 | + |
| 146 | +type VM struct { |
| 147 | + frame *frame |
| 148 | + globals []reflect.Value |
| 149 | +} |
| 150 | + |
| 151 | +type frame struct { |
| 152 | + vm *VM |
| 153 | + caller *frame |
| 154 | + locals []reflect.Value |
| 155 | + pc int // program counter |
| 156 | + code []instruction |
| 157 | +} |
| 158 | + |
| 159 | +type instruction struct { |
| 160 | + opcode byte |
| 161 | + amode byte // address mode |
| 162 | + addrs uint64 // operands (src1, src2, dst) |
| 163 | +} |
| 164 | + |
| 165 | +func (vm *VM) run() { |
| 166 | + run(vm.frame) |
| 167 | +} |
| 168 | + |
| 169 | +func run(fr *frame) { |
| 170 | + for { |
| 171 | +code: |
| 172 | + for _, code := range fr.code { |
| 173 | + switch exec(fr, code) { |
| 174 | + case cfReturn: |
| 175 | + return |
| 176 | + case cfNext: |
| 177 | + // fetching next instruction |
| 178 | + case cfJump: |
| 179 | + break code |
| 180 | + } |
| 181 | + } |
| 182 | + } |
| 183 | +} |
| 184 | + |
| 185 | +func exec(fr *frame, code instruction) cfKind { |
| 186 | + switch code.opcode { |
| 187 | + case opADDF: |
| 188 | + // dst = src1 + src2 |
| 189 | + fr.pc++ |
| 190 | + case opCALL: |
| 191 | + run(&frame{caller:fr, pc:0, code: from(src)}) |
| 192 | + case opRET: |
| 193 | + // fetch result if any |
| 194 | + return cfReturn |
| 195 | + case opGO: |
| 196 | + go func() { |
| 197 | + run(&frame{caller:fr}) |
| 198 | + }() |
| 199 | + // etc... |
| 200 | + } |
| 201 | +} |
| 202 | +``` |
| 203 | + |
| 204 | +At this moment, the proposal is to be able to byte compile this simple Go package: |
| 205 | + |
| 206 | +```go |
| 207 | +package main |
| 208 | + |
| 209 | +func add(i, j int) int { |
| 210 | + return i+j |
| 211 | +} |
| 212 | + |
| 213 | +func main() {} |
| 214 | +``` |
| 215 | + |
| 216 | +and in a later stage, be able to run `add(40, 2)`. |
| 217 | + |
| 218 | +## Rationale |
| 219 | + |
| 220 | +Why do we implement yet another Go interpreter and a REPL? |
| 221 | +Aren't there already enough of them? |
| 222 | + |
| 223 | +Here is a list of alternatives: |
| 224 | + |
| 225 | +- [llgoi](https://github.com/llvm-mirror/llgo/blob/master/cmd/llgoi/llgoi.go) is a JIT-enabled interpreter built on top of `LLVM` and `llgo`. |
| 226 | + The first issue with `llgoi` is the somewhat painfull installation process. |
| 227 | + This pain point should be resorbed with time (and also by providing [snap based](https://groups.google.com/forum/#!msg/llgo-dev/ny8MgDlNkng/8kEvgzfuCQAJ) |
| 228 | + isntallations of `llgoi`. |
| 229 | + But the main issue is that `llgo` development is behind that of the reference |
| 230 | + implementation of `Go`: `gc`. |
| 231 | + Also, the pace of development of `LLVM` itself (very fast) and the version skew |
| 232 | + that may result on users' machines *might* set the scene for difficult user |
| 233 | + support and debugging sessions. |
| 234 | + |
| 235 | +- [ssainterp](https://github.com/go-interpreter/ssainterp) and [ssadump -run](https://godoc.org/golang.org/x/tools/cmd/ssadump) |
| 236 | + are based on the SSA suite developped at [golang.org/x/tools/go/ssa](https://godoc.org/golang.org/x/tools/go/ssa). |
| 237 | + They are able to parse and interpret a vast majority of valid Go code, |
| 238 | + but lack an interactive interpreter mode. |
| 239 | + `ssadump` code is also clearly stated as *NOT* meant to be used as a |
| 240 | + production-grade interpreter for Go but merely as an adjunct for testing |
| 241 | + the SSA construction algorithm. |
| 242 | + |
| 243 | +- [igo](https://github.com/sbinet/igo) and [go-eval](https://github.com/sbinet/go-eval) |
| 244 | + are projects salvaged from the pre `Go-1` era. |
| 245 | + `go-eval` does not lend itself easily to compilation optimizations and lacks |
| 246 | + support for `imports`, `goroutines`, type creation, ... |
| 247 | + |
| 248 | +- [gore](https://github.com/motemen/gore) supports the whole Go language but |
| 249 | + does not (completely cleanly) preserve state or side effects between |
| 250 | + 2 interactive commands: `gore` recompiles on-the-fly your Go snippets and |
| 251 | + re-executes them. |
| 252 | + |
| 253 | +It seems necessary to implement some kind of a virtual machine to be able |
| 254 | +to provide an efficient and truly interactive interpreter for Go. |
| 255 | + |
| 256 | +The same question can be also raised about reimplementing a whole new VM. |
| 257 | +Couldn't we have somehow reused an already existing VM? |
| 258 | +`Python`, `Lua`, `JVM` and `Dis` come to mind. |
| 259 | +`Dis` is LGPL and thus not easily integrable in the usual Go ecosystem. |
| 260 | +`Python` and `Lua` have more permissive licenses, but their reference |
| 261 | +implementation are written in `C`, bringing either performance issues on the |
| 262 | +table (`cgo`) or throwing `go-get`-ability out of the window. |
| 263 | +There are however `Go` implementations (partial or complete) of these VMs: |
| 264 | + |
| 265 | +- https://github.com/Shopify/go-lua/blob/master/vm.go |
| 266 | +- https://github.com/flowlo/gothon/blob/master/frame.go |
| 267 | + |
| 268 | +The following issue at this point is the adequacy of their respective VM |
| 269 | +instructions sets with the Go language. |
| 270 | + |
| 271 | +Finally, why do we use the `Dis` VM instructions set, instead of a more recent |
| 272 | +or more in vogue set, such as [LLVM bitcode](http://llvm.org/docs/BitCodeFormat.html) |
| 273 | +and its associated [LLVM assembly](http://llvm.org/docs/LangRef.html), or the |
| 274 | +nascent [`wasm` bytecode](https://webassembly.github.io/) format? |
| 275 | + |
| 276 | +The `LLVM` solution suffers (to a lesser extent) from the same issues than the `llgoi` approach. |
| 277 | +We should note though there exists a pure-Go project to interact with the `LLVM` `IR`: |
| 278 | +[llir/llvm](https://github.com/llir/llvm). |
| 279 | +This project is still a work in progress at this time of writing (August 2016). |
| 280 | + |
| 281 | +`wasm` is probably a very strong and sensible option, and poised to take over |
| 282 | +the whole web industry. |
| 283 | +Unfortunately, there is only a work in progress `C/C++` project at the moment (August 2016), |
| 284 | +so it is probably a bit early to write code to target it. |
| 285 | +However, `wasm` is definitely a backend to monitor: `gopherjs`, a project transpiling |
| 286 | +Go code into `JavaScript` will probably target it at some point. |
| 287 | + |
| 288 | +## Compatibility - Open issues |
| 289 | + |
| 290 | +There are a few interesting issues when interpreting Go code in an interactive |
| 291 | +fashion. |
| 292 | + |
| 293 | +1. Should we allow mid-way imports of packages ? |
| 294 | + ``` |
| 295 | + go> slice := []string{"HELLO", "GO"} |
| 296 | + go> import "strings" |
| 297 | + go> println(strings.ToLower(slice[0])) |
| 298 | + ``` |
| 299 | + |
| 300 | + What if `slice` was instead named `strings`? |
| 301 | + Should we allow shadowing of variables by package identifiers? |
| 302 | + Should we instead re-shadow the package identifier with the variable |
| 303 | + identifier? |
| 304 | + The latter seems like the more idiomatic Go behaviour, or at least the |
| 305 | + behaviour a gopher would expect if she were to write the program in |
| 306 | + a compiled environment (_i.e.:_ with `goimports` putting the `import` |
| 307 | + statement at the top) |
| 308 | + |
| 309 | +2. Support for `cgo` and `import "C"` ? |
| 310 | +3. Support for packages with assembly ? (from the `stdlib` or otherwise) |
| 311 | +4. Calls to `syscalls` ? Should they be somehow recognized and performed |
| 312 | + on a dedicated `goroutine`? What should `os.Exit` do? and how? |
| 313 | +5. How to efficiently implement iteration over maps? |
| 314 | +6. How to implement `unsafe`? Should we? |
| 315 | +7. How to implement the definition of new types? |
| 316 | + Package `reflect` has some support for this (`StructOf`, `ArrayOf`, ...) but |
| 317 | + it currently has no support for defining new interface types nor any new |
| 318 | + named types. |
| 319 | +8. In an interactive interpreter, how do we define methods for a named type? |
| 320 | + When, and how, do we tell the interpreter that the method set of a given |
| 321 | + named type is done? |
| 322 | + |
| 323 | +## Implementation |
| 324 | + |
| 325 | +1. `dice.{VM,frame,instruction}` implementation leading to the execution |
| 326 | + of already decoded instructions, |
| 327 | +2. implementation of the bytecode stream decoder, |
| 328 | +3. implementation of the bytecode encoder, |
| 329 | +4. implementation of the interactive prompt of the REPL (with limitations), |
| 330 | +5. implementation of dynamically importing packages at the REPL level. |
| 331 | + This probably needs either a working `buildmode=plugin` from the `go` tool, |
| 332 | + or a complete handling of dynamically loading bytecode object files. |
| 333 | + |
0 commit comments