docs: update readme

explodingcamera · explodingcamera · commit c50bae752f3e · 2024-03-06T13:39:06.000+01:00
Signed-off-by: Henry Gressmann &lt;mail@henrygressmann.de&gt;
diff --git a/ARCHITECTURE.md b/ARCHITECTURE.md
@@ -3,26 +3,24 @@
 TinyWasm follows the general Runtime Structure described in the [WebAssembly Specification](https://webassembly.github.io/spec/core/exec/runtime.html).
 Some key differences are:
 
-- Values are stored without their type, (as `u64`), and the type is inferred from the instruction that uses them. This is possible because the instructions are validated before execution and the type of each value can be inferred from the instruction.
-- TinyWasm has a explicit stack for values, labels and frames. This is mostly for simplicity in the implementation, but also allows for some optimizations.
-- Floats always use a canonical NaN representation, the spec allows for multiple NaN representations.
-- TinyWasm uses a custom bytecode format (see [Bytecode Format](#bytecode-format) for more details)
-- Global state in the `Store` can be addressed from module instances other than the owning module. This is to allow more efficient access to imports and exports. Ownership is still enforced implicitly by requiring a reference to the instance to access it which can not be changed using the WebAssembly instructions.
-- The `Store` is not thread-safe. This is to allow for more efficient access to the `Store` and its contents. When later adding support for threads, a `Mutex` can be used to make it thread-safe but the overhead of requiring a lock for every access is not necessary for single-threaded applications.
-- TinyWasm is architectured to allow for a JIT compiler to be added later. Functions are stored as FunctionInstances which can contain either a `WasmFunction` or a `HostFunction`. A third variant `JitFunction` could be added later to store a pointer to the compiled function. This would allow for the JIT to be used transparently without changing the rest of the runtime.
-- TinyWasm is designed to be used in `no_std` environments. The `std` feature is enabled by default, but can be disabled to remove the dependency on `std` and `std::io`. This is done by disabling the `std` and `parser` features. The `logging` feature can also be disabled to remove the dependency on `log`. This is not recommended, since `libm` is not as performant as the compiler's math intrinsics, especially on wasm32 targets, but can be useful for resource-constrained devices or other environments where `std` is not available such as OS kernels.
-- Call Frames are executed in a loop instead of recursively. This allows the use of a single stack for all frames and makes it easier to pause execution and resume it later, or to step through the code one instruction at a time.
-- While other interpreters convert `locals` to be register-based when parsing the function body, TinyWasm keeps them in a stack. This is mostly for simplicity in the implementation, but performance is still comparable or better than other interpreters.
+- **Type Storage**: Types are inferred from usage context rather than stored explicitly, with all values held as `u64`.
+- **Stack Design**: Implements a specific stack for values, labels, and frames to simplify the implementation and enable optimizations.
+- **Bytecode Format**: Adopts a custom bytecode format to reduce memory usage and improve performance by allowing direct execution without the need for decoding.
+- **Global State Access**: Allows cross-module access to the `Store`'s global state, optimizing imports and exports access. Access requires a module instance reference, maintaining implicit ownership through a reference count.
+- **Non-thread-safe Store**: Designed for efficiency in single-threaded applications.
+- **JIT Compilation Support**: Prepares for JIT compiler integration with function instances designed to accommodate `WasmFunction`, `HostFunction`, or future `JitFunction`.
+- **`no_std` Environment Support**: Offers compatibility with `no_std` environments by allowing disabling of `std` feature
+- **Call Frame Execution**: Executes call frames in a single loop rather than recursively, using a single stack for all frames, facilitating easier pause, resume, and step-through.
 
 ## Bytecode Format
 
 To improve performance and reduce code size, instructions are encoded as enum variants instead of opcodes.
-This allows preprocessing the bytecode into a more compact format, which can be loaded directly into memory and executed without decoding later. This can skip the decoding step entirely on resource-constrained devices where memory is limited. See this [blog post](https://wasmer.io/posts/improving-with-zero-copy-deserialization) by Wasmer
+This allows preprocessing the bytecode into a more memory aligned format, which can be loaded directly into memory and executed without decoding later. This can skip the decoding step entirely on resource-constrained devices where memory is limited. See this [blog post](https://wasmer.io/posts/improving-with-zero-copy-deserialization) by Wasmer
 for more details which inspired this design.
 
 Some instructions are split into multiple variants to reduce the size of the enum (e.g. `br_table` and `br_label`).
 Additionally, label instructions contain offsets relative to the current instruction to make branching faster and easier to implement.
-Also, `End` instructions are split into `End` and `EndBlock`.
+Also, `End` instructions are split into `End` and `EndBlock`. Others are also combined, especially in cases where the stack can be skipped.
 
 See [instructions.rs](./crates/types/src/instructions.rs) for the full list of instructions.
 
diff --git a/BENCHMARKS.md b/BENCHMARKS.md
@@ -4,7 +4,8 @@ All benchmarks are run on a Ryzen 7 5800X with 32GB of RAM, running Linux 6.6.
 WebAssembly files are optimized using [wasm-opt](https://github.com/WebAssembly/binaryen),
 and the benchmark code is available in the `crates/benchmarks` folder.
 
-These are mainly preliminary benchmarks, and I will be adding more in the future that are also looking into memory usage and other metrics.
+These are mainly preliminary benchmarks, and I will be rewriting the benchmarks to be more accurate and to test more features in the future.
+In particular, I want to test and improve memory usage, as well as the performance of the parser.
 
 ## WebAssembly Settings
 
@@ -40,19 +41,19 @@ _\*\* essentially instant as it gets computed at compile time._
 
 ### Fib
 
-The first benchmark is a simple optimized Fibonacci function, which is a good way to show the overhead of calling functions and parsing the bytecode.
+The first benchmark is a simple optimized Fibonacci function, a good way to show the overhead of calling functions and parsing the bytecode.
 TinyWasm is slightly faster than Wasmi here, but that's probably because of the overhead of parsing the bytecode, as TinyWasm uses a custom bytecode to pre-process the WebAssembly bytecode.
 
 ### Fib-Rec
 
-This benchmark is a recursive Fibonacci function, which highlights some of the issues with the current implementation of TinyWasm's Call Stack.
-TinyWasm is a lot slower here, but that's because there's currently no way to reuse the same Call Frame for recursive calls, so a new Call Frame is allocated for every call. This is not a problem for most programs, and the upcoming `tail-call` proposal will make this a lot easier to implement.
+This benchmark is a recursive Fibonacci function, highlighting some issues with the current implementation of TinyWasm's Call Stack.
+TinyWasm is a lot slower here, but that's because there's currently no way to reuse the same Call Frame for recursive calls, so a new Call Frame is allocated for every call. This is not a problem for most programs; the upcoming `tail-call` proposal will make this much easier to implement.
 
 ### Argon2id
 
-This benchmark runs the Argon2id hashing algorithm, with 2 iterations, 1KB of memory, and 1 parallel lane.
-I had to decrease the memory usage from the default to 1KB, because especially the interpreters were struggling to finish in a reasonable amount of time.
-This is where `simd` instructions would be really useful, and it also highlights some of the issues with the current implementation of TinyWasm's Value Stack and Memory Instances. These spend a lot of time on `Vec` operations, so they might be a good place to start experimenting with Arena Allocation.
+This benchmark runs the Argon2id hashing algorithm with 2 iterations, 1KB of memory, and 1 parallel lane.
+I had to decrease the memory usage from the default to 1KB because the interpreters were struggling to finish in a reasonable amount of time.
+This is where `simd` instructions would be really useful, and it also highlights some of the issues with the current implementation of TinyWasm's Value Stack and Memory Instances. These spend much time on stack operations, so they might be a good place to experiment with Arena Allocation.
 
 ### Selfhosted
 
@@ -63,9 +64,9 @@ Wasmer also offers a pre-parsed module format, so keep in mind that this number
 
 ### Conclusion
 
-After profiling and fixing some low-hanging fruits, I found the biggest bottleneck to be Vector operations, especially for the Value Stack, and having shared access to Memory Instances using RefCell. These are the two areas I will be focusing on improving in the future, trying out Arena Allocation and other data structures to improve performance. Additionally, typed FuncHandles have a significant overhead over the untyped ones, so I will be looking into improving that as well. Still, I'm quite happy with the results, especially considering the use of standard Rust data structures.
+After profiling and fixing some low-hanging fruits, I found the biggest bottleneck to be Vector operations, especially for the Value Stack, and having shared access to Memory Instances using RefCell. These are the two areas I will focus on improving in the future, trying out Arena Allocation and other data structures to improve performance. Additionally, typed FuncHandles have a significant overhead over the untyped ones, so I will also look into improving that. Still, I'm pretty happy with the results, especially considering the focus on simplicity and portability over performance.
 
-Something that made a much bigger difference than I expected was to give compiler hints about cold paths, and to force inlining of some functions. This made the benchmarks 30%+ faster in some cases. A lot of places in the codebase have comments about what optimizations have been done.
+Something that made a much more significant difference than I expected was to give compiler hints about cold paths and to force the inlining of some functions. This made the benchmarks 30%+ faster in some cases. Many places in the codebase have comments about what optimizations have been done.
 
 # Running benchmarks
 
diff --git a/README.md b/README.md
@@ -13,14 +13,16 @@
 ## Why TinyWasm?
 
 - **Tiny**: TinyWasm is designed to be as small as possible without significantly compromising performance or functionality (< 6000 lines of code).
-- **Portable**: TinyWasm runs on any platform that Rust can target, including WebAssembly itself, with minimal external dependencies.
+- **Portable**: TinyWasm runs on any platform that Rust can target, including other WebAssembly Runtimes, with minimal external dependencies.
 - **Lightweight**: TinyWasm is easy to integrate and has a low call overhead, making it suitable for scripting and embedding.
 
 ## Status
 
-As of version `0.3.0`, TinyWasm successfully passes all the WebAssembly 1.0 tests in the [WebAssembly Test Suite](https://github.com/WebAssembly/testsuite). Work on the 2.0 tests is ongoing. This enables TinyWasm to run most WebAssembly programs, including versions of TinyWasm itself compiled to WebAssembly (see [examples/wasm-rust.rs](./examples/wasm-rust.rs)). The results of the testsuites are available [here](https://github.com/explodingcamera/tinywasm/tree/main/crates/tinywasm/tests/generated).
+As of version `0.3.0`, TinyWasm successfully passes all the WebAssembly 1.0 tests in the [WebAssembly Test Suite](https://github.com/WebAssembly/testsuite). Work on the 2.0 tests is ongoing. This enables TinyWasm to run most WebAssembly programs, including executing TinyWasm itself compiled to WebAssembly (see [examples/wasm-rust.rs](./examples/wasm-rust.rs)). The results of the testsuites are available [here](https://github.com/explodingcamera/tinywasm/tree/main/crates/tinywasm/tests/generated).
 
-The API is still unstable and may change at any time, so you probably don't want to use it in production _yet_. Note that TinyWasm isn't primarily designed for high performance; its focus lies more on simplicity, size, and portability. More details on its performance aspects can be found in [BENCHMARKS.md](./BENCHMARKS.md).
+The API is still unstable and may change at any time, so you probably don't want to use it in production _yet_. TinyWasm isn't primarily designed for high performance; it focuses more on simplicity, size, and portability. More details on its performance can be found in [BENCHMARKS.md](./BENCHMARKS.md).
+
+**Future Development**: The first major version will focus on improving the API and adding support for [WASI](https://wasi.dev/). While doing so, I also want to further simplify and reduce the codebase's size and improve the parser's performance.
 
 ## Supported Proposals
 
@@ -64,11 +66,22 @@ $ tinywasm-cli --help
 - **`archive`**\
   Enables pre-parsing of archives. This is enabled by default.
 - **`unsafe`**\
-  Uses `unsafe` code to improve performance, particularly in Memory access
+  Uses `unsafe` code to improve performance, particularly in Memory access.
 
-With all these features disabled, TinyWasm only depends on `core`, `alloc` and `libm` and can be used in `no_std` environments.
+With all these features disabled, TinyWasm only depends on `core`, `alloc` ,and `libm` and can be used in `no_std` environments.
 Since `libm` is not as performant as the compiler's math intrinsics, it is recommended to use the `std` feature if possible (at least [for now](https://github.com/rust-lang/rfcs/issues/2505)), especially on wasm32 targets.
 
+## Inspiration
+
+Big thanks to the authors of the following projects, which have inspired and influenced TinyWasm:
+
+- [wasmi](https://github.com/wasmi-labs/wasmi) - an efficient and lightweight WebAssembly interpreter that also runs on `no_std` environments
+- [wasm3](https://github.com/wasm3/wasm3) - a high performance WebAssembly interpreter written in C
+- [wazero](https://wazero.io/) - a zero-dependency WebAssembly interpreter written in go
+- [wain](https://github.com/rhysd/wain) - a zero-dependency WebAssembly interpreter written in Rust
+
+I encourage you to check these projects out if you're looking for a more mature and feature-complete WebAssembly interpreter.
+
 ## License
 
 Licensed under either of [Apache License, Version 2.0](./LICENSE-APACHE) or [MIT license](./LICENSE-MIT) at your option.