Replies: 1 comment 1 reply
-
Sounds great!
Is there a particular reason why the version component should always come at the end? In Wasm CM if the function name is specified in the path, the version is specified before the function name, i.e. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
This write-up is the result of trying to pin down the details of how components must be represented in the IR, and how that representation will be lowered into Miden Assembly. This representation must not only be suitable for translation from Wasm, but it must preserve enough of the original component structure (as described in WIT) so that the resulting Miden package can be processed as a Wasm component. Additionally, it must preserve all of the necessary metadata so that we can correctly emit lifting/lowering glue code at call sites which cross component (and Miden context) boundaries.
In this write-up, I document some assumptions and design considerations, as well as how I believe component translation should proceed (from Wasm to IR primarily, but to MASM as well).
Note that this write-up will at times combine: actual details of how components are represented in Wasm; conceptual details of how the concrete representation of components corresponds to components described in WIT; my personal opinion on how we should handle translating them into the IR and ultimately to MASM packages. As a result, you do have to keep in mind that some statements I may make below might state something matter-of-factly that is based on implied assumptions or design considerations, above and beyond what exists in the Wasm Component Model or in the current frontend. If you have any specific questions about a particular point, just leave a comment and I will follow up with more detail.
First, I would like to start by laying out some background/design considerations that will be relevant when we get into implementation details.
Producers
There are three classes of producer (i.e. compiler or other tool) of Wasm components that we need to consider in any design decisions we make:
First-party frontends, i.e. those languages/compilers for which we are maintaining the tooling. This is essentially just
rustc
for now. We can make assumptions about what is produced by these frontends, and in some cases may be able to even dictate how components are produced. I'd describe this as the second-most important producer class of interest at the moment.Miden packages. These are not strictly speaking Wasm components, however the format in which a Miden package will ultimately be described is in Component Model terms, via WIT. The actual code of a package of course is MAST, and metadata is provided to map the raw MAST to Miden Assembly concepts, i.e. modules and procedures, but at the current point in time, there is no direct connection between a WIT component description and the underlying MASM modules/procedures. Furthermore, tooling does not yet exist for creating packages directly from a Miden Assembly source project, and such tooling will need to exist. What this means for this discussion is that we must have a convention for structuring Miden Assembly modules/procedures that corresponds to the component structure we expect to be described in WIT. For obvious reasons, this is the most important "producer" of components, in that it is the substrate for all distribution of code in the Miden ecosystem - so any constraints imposed on components due to requirements of packaging necessarily limit what we can accept from other producers.
Third-party frontends, i.e. languages/compilers/tools maintained by someone else, using Wasm as their output format, and leveraging
midenc
to compile to a Miden package. There may also be frontends that directly emit their own Miden packages, but we can assume for this discussion that we interact with those the same as any Miden package we emit. This class is primarily interesting from the perspective of what we're allowed to assume about the Wasm we're given. We are allowed to dictate constraints on what we'll accept, but we should not do so unless there is a fundamental reason why the constraint is necessary. As an example, we probably should not assume anything too specific about the exact way in which a component is instantiated. On the flip side, we should be able to dictate certain things, such as requiring there to be a single top-level component, and that dependencies on other Miden packages be represented as component-level imports, as well as what things we allow to be exported/imported at the component level.Packaging
As a practical matter, it is essential that the lowering of Wasm components to Miden Assembly preserve enough component structure so that the assembler is able to reason about components in terms of Miden Assembly primitives (i.e. libraries, namespaces, modules, procedures). There are a few reasons for this:
The bottom line though, is that packaging dictates the structure of the Miden Assembly we produce, which in turn puts significant pressure on us to structure the IR in such a way that the lowering to MASM is straightforward and preserves all the metadata needed for packaging. The more distance there is between the IR and the resulting MASM, the more complicated compilation will be. Conversely, the more distance there is between Wasm components and IR-level components, the more work must be done in the frontend to translate them, and the stricter we will need to be about the Wasm we accept.
Miden Assembly Components
So given the above, here's my take on the correspondence we should aim for between Wasm components and Miden Assembly primitives (what I will generally refer to from now on as a "Miden component"). I will tie them together using WIT terminology to make things a little clearer, and also provide a way to correlate this to how a user will describe their component in WIT.
Worlds and Interfaces
A WIT world corresponds to a top-level Wasm component. While a world can directly export functions, in practice worlds primarily export interfaces. A world corresponds to a MASM
Library
(orProgram
, in cases where the component represents something executable). The component Wasm currently emitted byrustc
already maps cleanly to this model, i.e. we get a single top-level component which exports component instances corresponding to each interface of the world described in WIT.In terms of Miden Assembly source projects, we can represent a world quite easily. Let's use the example of a WIT package called
miden:base
, version1.0.0
, which exports two interfacescore
andtx
, and a top-level function calledinit
.Note
There are some issues with the current implementation of
LibraryPath
in Miden Assembly that I'm going to gloss over a bit here. I think we should change the syntax of paths, and importantly, I think we should allow for an optional version component to the path, which always comes at the end when present, to allow for disambiguating modules/procedures when different versions of the same component are present in the compilation graph. The new syntax would support these variations:Where
item
refers to anything exportable from a module, primarily procedures, but other items could be supported in the future.An additional assumption here is that
package@version
, as a path, implicitly refers to the root module of the package. Thuspackage#item@version
refers to an exportitem
defined inmod.masm
of the project.Examples:
With that out of the way, let's get back to how our example Miden Assembly component, called
miden:base
, would be structured in source form:miden:base
namespace, which is derived from the package name (and so will be used as the resultingLibraryNamespace
in MASM terms)mod.masm
in the project root, this is where theinit
function will be exported. This module is not explicitly named, i.e. it'sLibraryPath
will consist only of theLibraryNamespace
(and version in my proposed changes) with no additional path components.core.masm
andtx.masm
source files, defining modules of the same name. TheirLibraryPath
would be composed of themiden:base
namespace, and their module name, e.g.miden:base/tx
. These modules would be presumed to export procedures corresponding to the functions exported from their respective WIT interface. Note, however, that these modules do not have to contain the actual definitions of those procedures, they could be re-exported from elsewhere in the project - the only requirement is that the interface is fully accounted for in that module.LibraryPath
. We'll ignore it for now.When this project is assembled to MAST, we'll end up with the following metadata about that MAST:
miden:base
,miden:base/core
, andmiden:base/tx
init
function would be exported with aLibraryPath
where the namespace ismiden:base
and with a single path componentinit
, plus the version component when fully-qualified.With that metadata, and given the WIT for the package, we can use nothing other than the conventions described above to look up the MAST root corresponding to some function declared in the WIT. All in all, this is pretty straighforward, and gives us the clean mapping from WIT to MAST that
Wasm -> IR Translation
So now that we know what kind of MASM structure we want to end up with, we need to figure out how, in the compiler, we can receive a Wasm component as input, recover the information necessary to reason about that component in terms of WIT worlds and interfaces, and lower that to HIR. We must additionally work out what dependencies are required by the component, and how to resolve them. The former is what we're interested in here. Luckily for us, recovering the basic WIT structure is fairly straightforward:
Worlds and Interfaces
The Wasm component we are given by definition consists of a top-level Wasm component that corresponds to the WIT world from which the component was derived. Thus, any exports of the top-level component must correspond to items exported from the WIT world. In practice, these exports will almost always be of kind
instance
(a component instance, corresponding to a WIT interface),func
(a top-level component function using the Component Model ABI), ortype
(which we can ignore for purposes of this discussion).As a pre-requisite step, we must evaluate the instantiation of items in the top-level component, in order to work out not only what specifically is being exported (and with what names), but also to work out call site metadata for call sites which require ABI lifting/lowering, and the details of that. This is of particular importance, as lifting/lowering is completely implicit, so in order to determine when and what is needed, we must preserve not only the actual target of a function call (after linking the component and resolving the real callee), but the fact that the call passes through a
canon lift
orcanon lower
declaration.I've been able to validate the above assumptions (so far) by looking at the Wasm generated by various tests in our test suite, and the sources from which it was derived. It appears that we should be able to safely rely on the fact that a top-level component instantiates and exports one or more component instances that correspond to WIT interfaces.
Functions
Component-level function exports, so far as I have been able to substantiate, are always the product of aliasing an export from a core module, lifting it into the Canonical ABI using
canon lift
, and then exporting the synthetic function defined by thecanon lift
declaration. In other words, they do not simply represent an alias of some core Wasm function definition, but rather an actual function definition that is intended to be synthesized or provided by the Wasm runtime, and when called, implements the lowering/lifting of function arguments and results, respectively, to adapt a core Wasm function to the high-level Canonical ABI. As a result, these synthetic functions must be preserved in the IR in some form.It should also be noted that function exports from a component have a different name than the actual underlying core function definition. We must preserve both names, since the core function could be directly referenced by other functions in the same module (or in a sibling module), as well as via the component-level export name. I believe this is another reason why preserving the synthetic functions may be valuable.
Lastly, it seems to me that we should use the synthetic functions to hold any actual lowering/lifting code needed. This means that, in cases where ABI lowering/lifting code is required, the synthetic function declaration corresponds to an actual function definition that is generated by the compiler. In cases where the ABI does not require any glue code (e.g. because the arguments are all scalar integral values), then the synthetic function declaration would correspond to a re-export of the referenced core function definition once lowered to MASM.
To elaborate on how this is all expressed in Wasm component terms:
(export "foo" (func <function_index>) (func (type <type_index>)))
<function_index>
is a component function declared with(func (type <type_index>) (canon lift (core func <core_function_index>)))
<core_function_index>
is a core function, exported from a core module, brought into scope with(alias core export <core_module_index> "interface@version#foo" (core func <core_function_index>))
In a downstream component, these component-level exports are consumed by:
(import "foo" ..)
(core func (canon lower <function_index>))
(import "package/module@version" "foo" (func <core_function_index>))
As you can see, the component-level function declarations via
canon lift
andcanon lower
are important, and represent something more than just an alias of the underlying core function definition.IR Components
Fundamentally, I think we want the IR to represent things in a more WIT-like manner, i.e.:
Component
represents the top-level component/world, and ultimately corresponds to a Miden package. AComponent
consists of one or moreInterface
orModule
items.Interface
represents component instances exported from the top-level component, and corresponds to the original WIT interfaces. Functions in aInterface
can be declarations or definitions, but in both cases always use the Canonical ABI.Module
represents a core module within a component. Functions in aModule
always use the core Wasm ABI (or Miden ABI).Interfaces are lowered to MASM by either re-exporting a procedure from a sibling
Module
, or by lowering the function definition in theInterface
itself (which presumably internally references a procedure in a siblingModule
). Interfaces consisting solely of declarations can be used to represent information about external dependencies in the IR.Modules are lowered to MASM 1:1, as you'd expect.
This does not get into how data segments and global variables are handled, suffice to say that those are less problematic than the items mentioned here, and we have more freedom on how to handle them.
--
NOTE: I have a few follow up thoughts/notes as I've explored things further, but the above represents more or less the direction I'm planning on taking things in with this, depending on the outcome of this discussion.
Beta Was this translation helpful? Give feedback.
All reactions