-
Notifications
You must be signed in to change notification settings - Fork 241
Description
Sorry for the long post, I wanted to brain dump this stuff in case anyone else is interested.
#1712 got me thinking about .ksm files in general and why they are so gosh-darn big. In my experience they are always larger than a minified text version, and often bigger than the text version with whitespace and comments. This has always seemed ridiculous to me so I decided to start investigating a couple days ago and I now have numbers and suggestions!
All my testing so far as been done with boot.ks which I've heavily hand-optimized in the past. This may not be totally representative of scripts in general, but being able to compete with a hand-optimized and minified version seems like as good a place as any to start:
- as-is the file is 2403 bytes
- minified it's 1014 bytes
- compiled with KOS 1.0 it's 2593 bytes (yup, bigger than the source file)
I started by writing a disassembler for ksm files so I could see what was going on under the hood. The output below is from the summary section of this tool (I also dump the full symbol table and instruction listings).
Starting Point
File Size: 2593 bytes
Arguments: 147 in 922 bytes (6.3 avg)
Strings: 818 bytes
Label: 26 in 182 bytes
Magic: Args 33 Ops 33
Code: 365 ops w/ 324 args in 1013 bytes
PUSH x150 450
CALL x31 155
PUSHSCOPE x14 70
LABELRESET x19 57
RETURN x17 51
PUSHDELEGATERELOCATELATER x8 40
POPSCOPE x13 39
BRANCHFALSE x11 33
Debug: 600 bytes
First Pass
The debug section takes a ton of space. It's like bundling a .pdb and functionally isn't required (this is no different than minimizing a text file to be all on one line). I also noticed that there were 33 separate copies of the "magic" stack marking value in the arguments table. Apparently KOSArgMarkerType doesn't have an Equals/GetHashCode override so a new value is added for pretty much every function call in the code.
- remove debug info: 590 bytes
- fix duplicate stack markers: 32 bytes
File Size: 1971 bytes
Arguments: 115 in 890 bytes (7.7 avg)
Strings: 818 bytes
Label: 26 in 182 bytes
Magic: Args 1 Ops 33
Code: 365 ops w/ 324 args in 1013 bytes
PUSH x150 450
CALL x31 155
PUSHSCOPE x14 70
LABELRESET x19 57
RETURN x17 51
PUSHDELEGATERELOCATELATER x8 40
POPSCOPE x13 39
BRANCHFALSE x11 33
Debug: 10 bytes
Optimizer
Looking at the contents of the arguments section, ~34 of the values were compiler-generated labels (mostly due to LABELRESET instructions where function entry points had been labeled with something other than the normal @nnnn labels). I went through and changed all the labels back to default in-order values which got rid of all but the first LABELRESET instruction. I also updated all jump instructions to use relative offsets rather than labels.
- re-label instructions: 209 bytes
- relative jumps: 62 bytes
File Size: 1700 bytes
Arguments: 98 in 673 bytes (6.8 avg)
Strings: 551 bytes
Label: 8 in 56 bytes
Magic: Args 1 Ops 33
Code: 347 ops w/ 306 args in 959 bytes
PUSH x150 450
CALL x31 155
PUSHSCOPE x14 70
RETURN x17 51
PUSHDELEGATERELOCATELATER x8 40
POPSCOPE x13 39
BRANCHFALSE x11 33
POP x20 20
Debug: 10 bytes
This was about the best I could do with an external tool. I'm toying with the idea of extending this to be an optimizing compiler with at least a list of peephole optimizations which should shave off a bit more space (and a few IPU when running the script). I hand-optimized the assembly for one of the functions and was able get a ~20% reduction in both instructions and bytes, but most of those gains were from eliminating the scope push/pop/local and managing the single parameter just on the stack with DUP and SWAP as needed. I suspect that's probably very near the upper bound of what's even theoretically possible with an advanced optimizer.
Serialization changes
Much easier than instruction gymnastics was taking a look at tweaks to the kOS code. The biggest issue at this point was that I only had 98 values in the arguments section (down from 147 at the start), but references to these arg were all 2 bytes long since the section itself was 673 bytes long. Switching to ordinal indexes rather than byte-offsets was a simple change that saves a bunch of space. (I also fixed the stack marker properly).
- args by ordinal instead of offset: 306 bytes
- duplicate stack markers: 0 (already fixed above)
File Size: 1394 bytes
Arguments: 98 in 673 bytes (6.8 avg)
Strings: 551 bytes
Label: 8 in 56 bytes
Magic: Args 1 Ops 33
Code: 347 ops w/ 306 args in 653 bytes
PUSH x150 300
CALL x31 93
PUSHSCOPE x14 42
RETURN x17 34
POPSCOPE x13 26
PUSHDELEGATERELOCATELATER x8 24
BRANCHFALSE x11 22
POP x20 20
Debug: 10 bytes
New Opcodes
At this point most of the low-hanging fruit had been addressed so gains slowed down. Still, there are plenty of common patterns that can benefit from smarter encoding in the instruction stream. One of the most obvious at this point was duplicate string values for essentially the same variable (x vs $x vs $x*). I addedPushVarandPushFunc` opcodes to generate the proper version of a given identifier without having to pollute the arguments table. Adding a PushStackMark variant also seemed reasonable as that saves a byte per function call. Speaking of function calls, as far as I can tell the first MLField of the Call instruction isn't actually used anymore. Furthermore, the "" calls don't really need a dummy marker value if they had their own dedicated instruction, so I added CallIndirect and CallDirect. I also stripped all decorators off the identifiers for call operations since they are re-added if missing by the Execute method anyway.
- PushVar, PushFunc, PushStackMark: 31 bytes
- CallDirect, CallIndirect: 36 bytes
- Combined: 85 more bytes (lots of decorated symbols don't go away until both changes are made)
File Size: 1242 bytes
Arguments: 88 in 593 bytes (6.7 avg)
Strings: 472 bytes
Label: 8 in 56 bytes
Magic: Args 0 Ops 0
Code: 346 ops w/ 235 args in 581 bytes
PUSH x71 142
PushVar x38 76
CallDirect x25 50
PUSHSCOPE x14 42
RETURN x17 34
PushStackMark x33 33
POPSCOPE x13 26
PUSHDELEGATERELOCATELATER x8 24
BRANCHFALSE x11 22
POP x20 20
PushFunc x8 16
ADD x13 13
Debug: 10 bytes
Immediate Values
The next big thing cluttering the arguments table was constant values for various instructions. Most of these values are only used once and some are duplicated in the table since scope ids are shorts while branch offsets and return depth are integers which prevents them from de-duping. Branch offsets are typically small signed values so I added BranchIfFalseImm, BranchIfTrueImm, and JumpImm to use a signed-byte after the opcode to store the destination offset when possible. Scope IDs and return depth are usually small non-negative numbers so I did something similar with PushScopeImm, PopScopeImm, and ReturnImm using an unsigned byte in these cases.
- BranchIfFalseImm, BranchIfTrueImm, JumpImm: 55 bytes
- PushScopeImm, PopScopeImm, ReturnImm: 45
File Size: 1142 bytes
Arguments: 62 in 493 bytes (7.9 avg)
Strings: 472 bytes
Label: 8 in 56 bytes
Magic: Args 0 Ops 0
Code: 346 ops w/ 158 args in 581 bytes
PUSH x71 142
PushVar x38 76
CallDirect x25 50
PushScopeImm x14 42
ReturnImm x17 34
PushStackMark x33 33
PopScopeImm x13 26
PUSHDELEGATERELOCATELATER x8 24
BranchIfFalseImm x11 22
POP x20 20
PushFunc x8 16
ADD x13 13
Debug: 10 bytes
Immediate Labels
At this point there were still a few labels left in PUSHDELEGATERELOCATELATER instructions. Really these aren't much different from relative jump offsets, but since they cross code block CodePart boundaries I decided to use an absolute instruction offset in the ksm file for the immediate version (using a UInt16). I also baked the WithClosure flag into the instructions themselves. Finally, I changed the starting "previousLabel" value to "@0000" to make the first "LABELRESET" opcode unnecessary in most cases.
- PushRelocateImm, PushRelocateDelCaptureImm, PushRelocateDelImm: 47 bytes
- implicit start label: 9 bytes
File Size: 1086 bytes
Arguments: 54 in 437 bytes (8.0 avg)
Strings: 416 bytes
Label: 0 in 0 bytes
Magic: Args 0 Ops 0
Code: 346 ops w/ 142 args in 581 bytes
PUSH x71 142
PushVar x38 76
CallDirect x25 50
PushScopeImm x14 42
ReturnImm x17 34
PushArgBottom x33 33
PopScopeImm x13 26
PushRelocateDelCaptureImm x8 24
BranchIfFalseImm x11 22
POP x20 20
PushFunc x8 16
ADD x13 13
STORELOCAL x11 11
Debug: 10 bytes
(These were the only instructions that I didn't fully implement since they require translation. The rest have proper Execute methods and should in theory just work, though I was focused on what's possible in a .ksm file and didn't try to run any of the modified versions)
Ideas
- Debug info really bloats the files. Having that info is certainly valuable, but maybe that's something that could live in a separate file on the archive, especially now that we have folders? Compiling could generate a separate .kdb file with full line/column info and maybe even the source text and stick it in a 0:/debug/ folder. The .ksm file could include a file hash linking back to the .kdb for internal error message generation. I'd argue this is also more realistic since NASA isn't going to include debug symbols on it's probes. If something goes wrong we'll send a memory dump back to earth and decode the error there.
- As an alternate or interim option, adding a config setting to disable debug generation in .ksm files could be really nice to have for people who care more about file size than friendly error messages.
- Once I finish writing this up I'll submit a PR with the simpler library changes to fix the stack marker bug and switch to ordinal argument indexes. The latter makes a big difference for small to medium files and future efforts to reduce clutter in that table just increases the window before jumping to two arg bytes.
- Text labels are evil. Relative offsets and per-file instruction numbering should cover anything that labels are currently used for in the file and can be much more compact.
- Specialized instructions for common patterns don't make a huge difference on an individual basis but the 14 I added shaved an additional 22% off the file size combined.