Skip to content
This repository was archived by the owner on Oct 3, 2025. It is now read-only.

Commit c50bae7

Browse files
docs: update readme
Signed-off-by: Henry Gressmann <[email protected]>
1 parent f5a16aa commit c50bae7

File tree

3 files changed

+38
-26
lines changed

3 files changed

+38
-26
lines changed

ARCHITECTURE.md

Lines changed: 10 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -3,26 +3,24 @@
33
TinyWasm follows the general Runtime Structure described in the [WebAssembly Specification](https://webassembly.github.io/spec/core/exec/runtime.html).
44
Some key differences are:
55

6-
- Values are stored without their type, (as `u64`), and the type is inferred from the instruction that uses them. This is possible because the instructions are validated before execution and the type of each value can be inferred from the instruction.
7-
- TinyWasm has a explicit stack for values, labels and frames. This is mostly for simplicity in the implementation, but also allows for some optimizations.
8-
- Floats always use a canonical NaN representation, the spec allows for multiple NaN representations.
9-
- TinyWasm uses a custom bytecode format (see [Bytecode Format](#bytecode-format) for more details)
10-
- Global state in the `Store` can be addressed from module instances other than the owning module. This is to allow more efficient access to imports and exports. Ownership is still enforced implicitly by requiring a reference to the instance to access it which can not be changed using the WebAssembly instructions.
11-
- The `Store` is not thread-safe. This is to allow for more efficient access to the `Store` and its contents. When later adding support for threads, a `Mutex` can be used to make it thread-safe but the overhead of requiring a lock for every access is not necessary for single-threaded applications.
12-
- TinyWasm is architectured to allow for a JIT compiler to be added later. Functions are stored as FunctionInstances which can contain either a `WasmFunction` or a `HostFunction`. A third variant `JitFunction` could be added later to store a pointer to the compiled function. This would allow for the JIT to be used transparently without changing the rest of the runtime.
13-
- TinyWasm is designed to be used in `no_std` environments. The `std` feature is enabled by default, but can be disabled to remove the dependency on `std` and `std::io`. This is done by disabling the `std` and `parser` features. The `logging` feature can also be disabled to remove the dependency on `log`. This is not recommended, since `libm` is not as performant as the compiler's math intrinsics, especially on wasm32 targets, but can be useful for resource-constrained devices or other environments where `std` is not available such as OS kernels.
14-
- Call Frames are executed in a loop instead of recursively. This allows the use of a single stack for all frames and makes it easier to pause execution and resume it later, or to step through the code one instruction at a time.
15-
- While other interpreters convert `locals` to be register-based when parsing the function body, TinyWasm keeps them in a stack. This is mostly for simplicity in the implementation, but performance is still comparable or better than other interpreters.
6+
- **Type Storage**: Types are inferred from usage context rather than stored explicitly, with all values held as `u64`.
7+
- **Stack Design**: Implements a specific stack for values, labels, and frames to simplify the implementation and enable optimizations.
8+
- **Bytecode Format**: Adopts a custom bytecode format to reduce memory usage and improve performance by allowing direct execution without the need for decoding.
9+
- **Global State Access**: Allows cross-module access to the `Store`'s global state, optimizing imports and exports access. Access requires a module instance reference, maintaining implicit ownership through a reference count.
10+
- **Non-thread-safe Store**: Designed for efficiency in single-threaded applications.
11+
- **JIT Compilation Support**: Prepares for JIT compiler integration with function instances designed to accommodate `WasmFunction`, `HostFunction`, or future `JitFunction`.
12+
- **`no_std` Environment Support**: Offers compatibility with `no_std` environments by allowing disabling of `std` feature
13+
- **Call Frame Execution**: Executes call frames in a single loop rather than recursively, using a single stack for all frames, facilitating easier pause, resume, and step-through.
1614

1715
## Bytecode Format
1816

1917
To improve performance and reduce code size, instructions are encoded as enum variants instead of opcodes.
20-
This allows preprocessing the bytecode into a more compact format, which can be loaded directly into memory and executed without decoding later. This can skip the decoding step entirely on resource-constrained devices where memory is limited. See this [blog post](https://wasmer.io/posts/improving-with-zero-copy-deserialization) by Wasmer
18+
This allows preprocessing the bytecode into a more memory aligned format, which can be loaded directly into memory and executed without decoding later. This can skip the decoding step entirely on resource-constrained devices where memory is limited. See this [blog post](https://wasmer.io/posts/improving-with-zero-copy-deserialization) by Wasmer
2119
for more details which inspired this design.
2220

2321
Some instructions are split into multiple variants to reduce the size of the enum (e.g. `br_table` and `br_label`).
2422
Additionally, label instructions contain offsets relative to the current instruction to make branching faster and easier to implement.
25-
Also, `End` instructions are split into `End` and `EndBlock`.
23+
Also, `End` instructions are split into `End` and `EndBlock`. Others are also combined, especially in cases where the stack can be skipped.
2624

2725
See [instructions.rs](./crates/types/src/instructions.rs) for the full list of instructions.
2826

BENCHMARKS.md

Lines changed: 10 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,8 @@ All benchmarks are run on a Ryzen 7 5800X with 32GB of RAM, running Linux 6.6.
44
WebAssembly files are optimized using [wasm-opt](https://github.com/WebAssembly/binaryen),
55
and the benchmark code is available in the `crates/benchmarks` folder.
66

7-
These are mainly preliminary benchmarks, and I will be adding more in the future that are also looking into memory usage and other metrics.
7+
These are mainly preliminary benchmarks, and I will be rewriting the benchmarks to be more accurate and to test more features in the future.
8+
In particular, I want to test and improve memory usage, as well as the performance of the parser.
89

910
## WebAssembly Settings
1011

@@ -40,19 +41,19 @@ _\*\* essentially instant as it gets computed at compile time._
4041

4142
### Fib
4243

43-
The first benchmark is a simple optimized Fibonacci function, which is a good way to show the overhead of calling functions and parsing the bytecode.
44+
The first benchmark is a simple optimized Fibonacci function, a good way to show the overhead of calling functions and parsing the bytecode.
4445
TinyWasm is slightly faster than Wasmi here, but that's probably because of the overhead of parsing the bytecode, as TinyWasm uses a custom bytecode to pre-process the WebAssembly bytecode.
4546

4647
### Fib-Rec
4748

48-
This benchmark is a recursive Fibonacci function, which highlights some of the issues with the current implementation of TinyWasm's Call Stack.
49-
TinyWasm is a lot slower here, but that's because there's currently no way to reuse the same Call Frame for recursive calls, so a new Call Frame is allocated for every call. This is not a problem for most programs, and the upcoming `tail-call` proposal will make this a lot easier to implement.
49+
This benchmark is a recursive Fibonacci function, highlighting some issues with the current implementation of TinyWasm's Call Stack.
50+
TinyWasm is a lot slower here, but that's because there's currently no way to reuse the same Call Frame for recursive calls, so a new Call Frame is allocated for every call. This is not a problem for most programs; the upcoming `tail-call` proposal will make this much easier to implement.
5051

5152
### Argon2id
5253

53-
This benchmark runs the Argon2id hashing algorithm, with 2 iterations, 1KB of memory, and 1 parallel lane.
54-
I had to decrease the memory usage from the default to 1KB, because especially the interpreters were struggling to finish in a reasonable amount of time.
55-
This is where `simd` instructions would be really useful, and it also highlights some of the issues with the current implementation of TinyWasm's Value Stack and Memory Instances. These spend a lot of time on `Vec` operations, so they might be a good place to start experimenting with Arena Allocation.
54+
This benchmark runs the Argon2id hashing algorithm with 2 iterations, 1KB of memory, and 1 parallel lane.
55+
I had to decrease the memory usage from the default to 1KB because the interpreters were struggling to finish in a reasonable amount of time.
56+
This is where `simd` instructions would be really useful, and it also highlights some of the issues with the current implementation of TinyWasm's Value Stack and Memory Instances. These spend much time on stack operations, so they might be a good place to experiment with Arena Allocation.
5657

5758
### Selfhosted
5859

@@ -63,9 +64,9 @@ Wasmer also offers a pre-parsed module format, so keep in mind that this number
6364

6465
### Conclusion
6566

66-
After profiling and fixing some low-hanging fruits, I found the biggest bottleneck to be Vector operations, especially for the Value Stack, and having shared access to Memory Instances using RefCell. These are the two areas I will be focusing on improving in the future, trying out Arena Allocation and other data structures to improve performance. Additionally, typed FuncHandles have a significant overhead over the untyped ones, so I will be looking into improving that as well. Still, I'm quite happy with the results, especially considering the use of standard Rust data structures.
67+
After profiling and fixing some low-hanging fruits, I found the biggest bottleneck to be Vector operations, especially for the Value Stack, and having shared access to Memory Instances using RefCell. These are the two areas I will focus on improving in the future, trying out Arena Allocation and other data structures to improve performance. Additionally, typed FuncHandles have a significant overhead over the untyped ones, so I will also look into improving that. Still, I'm pretty happy with the results, especially considering the focus on simplicity and portability over performance.
6768

68-
Something that made a much bigger difference than I expected was to give compiler hints about cold paths, and to force inlining of some functions. This made the benchmarks 30%+ faster in some cases. A lot of places in the codebase have comments about what optimizations have been done.
69+
Something that made a much more significant difference than I expected was to give compiler hints about cold paths and to force the inlining of some functions. This made the benchmarks 30%+ faster in some cases. Many places in the codebase have comments about what optimizations have been done.
6970

7071
# Running benchmarks
7172

README.md

Lines changed: 18 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -13,14 +13,16 @@
1313
## Why TinyWasm?
1414

1515
- **Tiny**: TinyWasm is designed to be as small as possible without significantly compromising performance or functionality (< 6000 lines of code).
16-
- **Portable**: TinyWasm runs on any platform that Rust can target, including WebAssembly itself, with minimal external dependencies.
16+
- **Portable**: TinyWasm runs on any platform that Rust can target, including other WebAssembly Runtimes, with minimal external dependencies.
1717
- **Lightweight**: TinyWasm is easy to integrate and has a low call overhead, making it suitable for scripting and embedding.
1818

1919
## Status
2020

21-
As of version `0.3.0`, TinyWasm successfully passes all the WebAssembly 1.0 tests in the [WebAssembly Test Suite](https://github.com/WebAssembly/testsuite). Work on the 2.0 tests is ongoing. This enables TinyWasm to run most WebAssembly programs, including versions of TinyWasm itself compiled to WebAssembly (see [examples/wasm-rust.rs](./examples/wasm-rust.rs)). The results of the testsuites are available [here](https://github.com/explodingcamera/tinywasm/tree/main/crates/tinywasm/tests/generated).
21+
As of version `0.3.0`, TinyWasm successfully passes all the WebAssembly 1.0 tests in the [WebAssembly Test Suite](https://github.com/WebAssembly/testsuite). Work on the 2.0 tests is ongoing. This enables TinyWasm to run most WebAssembly programs, including executing TinyWasm itself compiled to WebAssembly (see [examples/wasm-rust.rs](./examples/wasm-rust.rs)). The results of the testsuites are available [here](https://github.com/explodingcamera/tinywasm/tree/main/crates/tinywasm/tests/generated).
2222

23-
The API is still unstable and may change at any time, so you probably don't want to use it in production _yet_. Note that TinyWasm isn't primarily designed for high performance; its focus lies more on simplicity, size, and portability. More details on its performance aspects can be found in [BENCHMARKS.md](./BENCHMARKS.md).
23+
The API is still unstable and may change at any time, so you probably don't want to use it in production _yet_. TinyWasm isn't primarily designed for high performance; it focuses more on simplicity, size, and portability. More details on its performance can be found in [BENCHMARKS.md](./BENCHMARKS.md).
24+
25+
**Future Development**: The first major version will focus on improving the API and adding support for [WASI](https://wasi.dev/). While doing so, I also want to further simplify and reduce the codebase's size and improve the parser's performance.
2426

2527
## Supported Proposals
2628

@@ -64,11 +66,22 @@ $ tinywasm-cli --help
6466
- **`archive`**\
6567
Enables pre-parsing of archives. This is enabled by default.
6668
- **`unsafe`**\
67-
Uses `unsafe` code to improve performance, particularly in Memory access
69+
Uses `unsafe` code to improve performance, particularly in Memory access.
6870

69-
With all these features disabled, TinyWasm only depends on `core`, `alloc` and `libm` and can be used in `no_std` environments.
71+
With all these features disabled, TinyWasm only depends on `core`, `alloc` ,and `libm` and can be used in `no_std` environments.
7072
Since `libm` is not as performant as the compiler's math intrinsics, it is recommended to use the `std` feature if possible (at least [for now](https://github.com/rust-lang/rfcs/issues/2505)), especially on wasm32 targets.
7173

74+
## Inspiration
75+
76+
Big thanks to the authors of the following projects, which have inspired and influenced TinyWasm:
77+
78+
- [wasmi](https://github.com/wasmi-labs/wasmi) - an efficient and lightweight WebAssembly interpreter that also runs on `no_std` environments
79+
- [wasm3](https://github.com/wasm3/wasm3) - a high performance WebAssembly interpreter written in C
80+
- [wazero](https://wazero.io/) - a zero-dependency WebAssembly interpreter written in go
81+
- [wain](https://github.com/rhysd/wain) - a zero-dependency WebAssembly interpreter written in Rust
82+
83+
I encourage you to check these projects out if you're looking for a more mature and feature-complete WebAssembly interpreter.
84+
7285
## License
7386

7487
Licensed under either of [Apache License, Version 2.0](./LICENSE-APACHE) or [MIT license](./LICENSE-MIT) at your option.

0 commit comments

Comments
 (0)