.ksm file optimization (experiments and results)

Sorry for the long post, I wanted to brain dump this stuff in case anyone else is interested.
#1712 got me thinking about .ksm files in general and why they are so gosh-darn big. In my experience they are always larger than a minified text version, and often bigger than the text version with whitespace and comments. This has always seemed ridiculous to me so I decided to start investigating a couple days ago and I now have numbers and suggestions!

All my testing so far as been done with [boot.ks](http://pastebin.com/82UqzPXk) which I've heavily hand-optimized in the past. This may not be totally representative of scripts in general, but being able to compete with a hand-optimized and minified version seems like as good a place as any to start:
- as-is the file is 2403 bytes
- minified it's 1014 bytes
- compiled with KOS 1.0 it's 2593 bytes (yup, bigger than the source file)

I started by writing a disassembler for ksm files so I could see what was going on under the hood. The output below is from the summary section of this tool (I also dump the full symbol table and instruction listings).
## Starting Point

```
File Size: 2593 bytes

Arguments: 147 in 922 bytes (6.3 avg)
  Strings: 818 bytes
  Label: 26 in 182 bytes
  Magic: Args 33 Ops 33

Code: 365 ops w/ 324 args in 1013 bytes
  PUSH                      x150       450
  CALL                      x31        155
  PUSHSCOPE                 x14         70
  LABELRESET                x19         57
  RETURN                    x17         51
  PUSHDELEGATERELOCATELATER x8          40
  POPSCOPE                  x13         39
  BRANCHFALSE               x11         33

Debug: 600 bytes
```
## First Pass

The debug section takes a ton of space. It's like bundling a .pdb and functionally isn't required (this is no different than minimizing a text file to be all on one line). I also noticed that there were 33 separate copies of the "magic" stack marking value in the arguments table. Apparently `KOSArgMarkerType` doesn't have an Equals/GetHashCode override so a new value is added for pretty much every function call in the code.
- remove debug info: 590 bytes
- fix duplicate stack markers: 32 bytes

```
File Size: 1971 bytes

Arguments: 115 in 890 bytes (7.7 avg)
  Strings: 818 bytes
  Label: 26 in 182 bytes
  Magic: Args 1 Ops 33

Code: 365 ops w/ 324 args in 1013 bytes
  PUSH                      x150       450
  CALL                      x31        155
  PUSHSCOPE                 x14         70
  LABELRESET                x19         57
  RETURN                    x17         51
  PUSHDELEGATERELOCATELATER x8          40
  POPSCOPE                  x13         39
  BRANCHFALSE               x11         33

Debug: 10 bytes
```
## Optimizer

Looking at the contents of the arguments section, ~34 of the values were compiler-generated labels (mostly due to LABELRESET instructions where function entry points had been labeled with something other than the normal @NNNN labels). I went through and changed all the labels back to default in-order values which got rid of all but the first LABELRESET instruction. I also updated all jump instructions to use relative offsets rather than labels.
- re-label instructions: 209 bytes
- relative jumps: 62 bytes

```
File Size: 1700 bytes

Arguments: 98 in 673 bytes (6.8 avg)
  Strings: 551 bytes
  Label: 8 in 56 bytes
  Magic: Args 1 Ops 33

Code: 347 ops w/ 306 args in 959 bytes
  PUSH                      x150       450
  CALL                      x31        155
  PUSHSCOPE                 x14         70
  RETURN                    x17         51
  PUSHDELEGATERELOCATELATER x8          40
  POPSCOPE                  x13         39
  BRANCHFALSE               x11         33
  POP                       x20         20

Debug: 10 bytes
```

This was about the best I could do with an external tool. I'm toying with the idea of extending this to be an optimizing compiler with at least a list of peephole optimizations which should shave off a bit more space (and a few IPU when running the script). I hand-optimized the assembly for one of the functions and was able get a ~20% reduction in both instructions and bytes, but most of those gains were from eliminating the scope push/pop/local and managing the single parameter just on the stack with DUP and SWAP as needed. I suspect that's probably very near the upper bound of what's even theoretically possible with an advanced optimizer.
## Serialization changes

Much easier than instruction gymnastics was taking a look at tweaks to the kOS code. The biggest issue at this point was that I only had 98 values in the arguments section (down from 147 at the start), but references to these arg were all 2 bytes long since the section itself was 673 bytes long. Switching to ordinal indexes rather than byte-offsets was a simple change that saves a bunch of space. (I also fixed the stack marker properly).
- args by ordinal instead of offset: 306 bytes
- duplicate stack markers: 0 (already fixed above)

```
File Size: 1394 bytes

Arguments: 98 in 673 bytes (6.8 avg)
  Strings: 551 bytes
  Label: 8 in 56 bytes
  Magic: Args 1 Ops 33

Code: 347 ops w/ 306 args in 653 bytes
  PUSH                      x150       300
  CALL                      x31         93
  PUSHSCOPE                 x14         42
  RETURN                    x17         34
  POPSCOPE                  x13         26
  PUSHDELEGATERELOCATELATER x8          24
  BRANCHFALSE               x11         22
  POP                       x20         20

Debug: 10 bytes
```
## New Opcodes

At this point most of the low-hanging fruit had been addressed so gains slowed down. Still, there are plenty of common patterns that can benefit from smarter encoding in the instruction stream. One of the most obvious at this point was duplicate string values for essentially the same variable (`x` vs `$x` vs `$x*). I added`PushVar`and`PushFunc` opcodes to generate the proper version of a given identifier without having to pollute the arguments table. Adding a PushStackMark variant also seemed reasonable as that saves a byte per function call. Speaking of function calls, as far as I can tell the first MLField of the Call instruction isn't actually used anymore. Furthermore, the "<indirect>" calls don't really need a dummy marker value if they had their own dedicated instruction, so I added CallIndirect and CallDirect. I also stripped all decorators off the identifiers for call operations since they are re-added if missing by the Execute method anyway.
- PushVar, PushFunc, PushStackMark: 31 bytes
- CallDirect, CallIndirect: 36 bytes
- Combined: 85 more bytes (lots of decorated symbols don't go away until both changes are made)

```
File Size: 1242 bytes

Arguments: 88 in 593 bytes (6.7 avg)
  Strings: 472 bytes
  Label: 8 in 56 bytes
  Magic: Args 0 Ops 0

Code: 346 ops w/ 235 args in 581 bytes
  PUSH                      x71        142
  PushVar                   x38         76
  CallDirect                x25         50
  PUSHSCOPE                 x14         42
  RETURN                    x17         34
  PushStackMark             x33         33
  POPSCOPE                  x13         26
  PUSHDELEGATERELOCATELATER x8          24
  BRANCHFALSE               x11         22
  POP                       x20         20
  PushFunc                  x8          16
  ADD                       x13         13

Debug: 10 bytes
```
## Immediate Values

The next big thing cluttering the arguments table was constant values for various instructions. Most of these values are only used once and some are duplicated in the table since scope ids are shorts while branch offsets and return depth are integers which prevents them from de-duping. Branch offsets are typically small signed values so I added BranchIfFalseImm, BranchIfTrueImm, and JumpImm to use a signed-byte after the opcode to store the destination offset when possible. Scope IDs and return depth are usually small non-negative numbers so I did something similar with PushScopeImm, PopScopeImm, and ReturnImm using an unsigned byte in these cases.
- BranchIfFalseImm, BranchIfTrueImm, JumpImm: 55 bytes
- PushScopeImm, PopScopeImm, ReturnImm: 45

```
File Size: 1142 bytes

Arguments: 62 in 493 bytes (7.9 avg)
  Strings: 472 bytes
  Label: 8 in 56 bytes
  Magic: Args 0 Ops 0

Code: 346 ops w/ 158 args in 581 bytes
  PUSH                      x71        142
  PushVar                   x38         76
  CallDirect                x25         50
  PushScopeImm              x14         42
  ReturnImm                 x17         34
  PushStackMark             x33         33
  PopScopeImm               x13         26
  PUSHDELEGATERELOCATELATER x8          24
  BranchIfFalseImm          x11         22
  POP                       x20         20
  PushFunc                  x8          16
  ADD                       x13         13

Debug: 10 bytes
```
## Immediate Labels

At this point there were still a few labels left in PUSHDELEGATERELOCATELATER instructions. Really these aren't much different from relative jump offsets, but since they cross code block CodePart boundaries I decided to use an absolute instruction offset in the ksm file for the immediate version (using a UInt16). I also baked the WithClosure flag into the instructions themselves. Finally, I changed the starting "previousLabel" value to "@0000" to make the first "LABELRESET" opcode unnecessary in most cases.
- PushRelocateImm, PushRelocateDelCaptureImm, PushRelocateDelImm: 47 bytes
- implicit start label: 9 bytes

```
File Size: 1086 bytes

Arguments: 54 in 437 bytes (8.0 avg)
  Strings: 416 bytes
  Label: 0 in 0 bytes
  Magic: Args 0 Ops 0

Code: 346 ops w/ 142 args in 581 bytes
  PUSH                      x71        142
  PushVar                   x38         76
  CallDirect                x25         50
  PushScopeImm              x14         42
  ReturnImm                 x17         34
  PushArgBottom             x33         33
  PopScopeImm               x13         26
  PushRelocateDelCaptureImm x8          24
  BranchIfFalseImm          x11         22
  POP                       x20         20
  PushFunc                  x8          16
  ADD                       x13         13
  STORELOCAL                x11         11

Debug: 10 bytes
```

(These were the only instructions that I didn't fully implement since they require translation. The rest have proper Execute methods and should in theory just work, though I was focused on what's possible in a .ksm file and didn't try to run any of the modified versions)
## Ideas
- Debug info really bloats the files. Having that info is certainly valuable, but maybe that's something that could live in a separate file on the archive, especially now that we have folders? Compiling could generate a separate .kdb file with full line/column info and maybe even the source text and stick it in a 0:/debug/ folder. The .ksm file could include a file hash linking back to the .kdb for internal error message generation. I'd argue this is also more realistic since NASA isn't going to include debug symbols on it's probes. If something goes wrong we'll send a memory dump back to earth and decode the error there.
- As an alternate or interim option, adding a config setting to disable debug generation in .ksm files could be really nice to have for people who care more about file size than friendly error messages.
- Once I finish writing this up I'll submit a PR with the simpler library changes to fix the stack marker bug and switch to ordinal argument indexes. The latter makes a big difference for small to medium files and future efforts to reduce clutter in that table just increases the window before jumping to two arg bytes.
- Text labels are evil. Relative offsets and per-file instruction numbering should cover anything that labels are currently used for in the file and can be much more compact.
- Specialized instructions for common patterns don't make a huge difference on an individual basis but the 14 I added shaved an additional 22% off the file size combined.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

.ksm file optimization (experiments and results) #1732

Starting Point

First Pass

Optimizer

Serialization changes

New Opcodes

Immediate Values

Immediate Labels

Ideas

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

.ksm file optimization (experiments and results) #1732

Description

Starting Point

First Pass

Optimizer

Serialization changes

New Opcodes

Immediate Values

Immediate Labels

Ideas

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions