-
Notifications
You must be signed in to change notification settings - Fork 298
Path to the Screen
In this document, we'll describe the path WR (short for WebRender) takes to get actual pages on screen.
Renderer::new()
creates the WR instance with the following components:
-
RenderBackend
(RB). It's not returned from the function, but instead put to work in a separate thread and communicate via messages with the following objects. -
RenderApiSender
, needed to produceRenderApi
instances, each with a unique namespace. -
Renderer
owns the graphics context and does the actual rendering.
When issuing commands through RenderApi
, they get serialized and sent over the IPC channel (not necessary through the IPC boundary - controlled by a cargo feature) to RB.
RenderApi
attempts to be consistent with regards to the resource API:
-
generate_something_key()
- get a new unique key that is not yet associated with any data. -
add_something(key, ...)
- associate the key with actual data (e.g. texels of an image). -
push_something_else
- there can be several methods that use ourkey
in one way or another. For example, an image can be provided as a part ofClipRegion
for any primitive. -
update_something(key, ...)
- update a part of a resource associate with thekey
. -
delete_*(key)
- destroy the associated resource with all its contents.
A frame is considered complete when it has a root pipeline, set by set_root_pipeline
, and a display list, set by set_display_list
. The frame doesn't get rendered until generate_frame
is called, or any sort of scrolling is requested (scroll
, scroll_layer_with_id
, etc).
When calling generate_frame
, the user can provide a list of new values for the animated properties of a frame. This doesn't force the re-generation of a frame by the RenderBackend
, and re-uses the last formed frame that Renderer
received.
The backend listens to the user commands (issued via RenderApi
) via an IPC channel. It's task is to process the data from the representation the user sees it to the one GPU can consume directly in order to draw it on screen. The result of the process is a Frame
object that gets sent to the Renderer
.
Threading of the backend, or even its existence is opaque to the user. It's currently running in a separate thread, but we may transition towards a thread pool with some sort of a job system in the future.
The data is provided in a form of a tree, where nodes are stacking contexts and iframes, and leafs are primitives. Flattening is the process that RB does first when processing a scene, it records layers and primitives, saving all the associated CPU/GPU data into a number of containers:
-
stacking_context_store
- records all stacking contexts information for the CPU access -
packed_layers
- all GPU information about transformations and bounds of document layers prim_store
PS contains all the information about actual primitives. On CPU side, it knows about bounding rectangles, common meta-data, as well as specific bits of information for text runs, images, gradients, etc. Meta-data hooks up each primitive with the relevant clip information, CPU & GPU primitive indices, GPU data addresses, and required render tasks.
On the GPU side, the data is split into:
- geometry - contains actual rectangle shapes and clipping bounds.
- resource_rects - stores rectangle coordinates for various primitives. This is required for late update of those coordinates, for example when an external image is getting updated before shown on screen.
- generic blocks of data for 16, 32, 64, and 128 bytes length.
Each of these GPU containers is associated with a texture on the shader side. These textures are typically accessed from the vertex shader, in order to read the data about the primitive and place it properly.
When all the data is nicely laid out in arrays, we can start converting it into an actual task tree. This is what FrameBuilder::build
is doing.
We recalculate the clip/scroll groups and nodes, compute the visibility of stacking contexts, and then derive the following information for all visible items:
- clipping mask that needs to be computed beforehand and applied when rendering
- box shadow tasks
- bounding rectangles
- actual text glyphs for visible framgents of text
- gradient stops
All the missing data, like the contents of images being loaded, or rasterized glyphs, is requested simultaneously and then waited upon during the build
procedure. Some primitives need a separate update pass in order to patch the bits of data that depended on the unknowns during the previous phases, this is done in resolve_primitives
, e.g. texture IDs and GPU rectangles.
The result of build()
is a single root render task. You can read more about those on the Life of a Task page.
The task tree represents all the work GPU needs to do in a shape of a tree, where children nodes are dependencies. The depth of this tree determines the number of passes that will need to occur in order to execute all the tasks. Each pass is chunk of work that doesn't have inner dependencies, and only depends on the result of the previous pass (if there is any).
For example, considering this task tree:
A -> B -> C
-> D -> E -> F
The number of passes will be 4, including the following tasks: [[F], [C, E], [B, D], [A]]
. This is what assign_to_passes
method is doing - flattening the task tree into passes.
Each pass is associated with a multi-layer texture that stores its results. Except for the last pass, which always draws to the screen. The texture needs to be multi-layered, since we don't statically know how much texel space we'll need to be assign to it. When traversing the task tree, we gather all required render targets targets, and from there, calling RenderPass::allocate_target
for each.
Texture cache is a part of Render Backend. It stores a set of texture pages that are split depending on the format: A8, RGB8, and RGBA8. There may be more in the future.
When a request is received by the RB to add an image, for example, the set of texture pages with the relevant format is considered by TextureCache::allocate
. If we aren't able to fit the data into one of the existing pages, we allocated a new one.
Each page serves as a texture atlas holding multiple textures. If a requested texture is too big for even an empty page, it gets split into multiple tiles.
Texture data is stored within the cache as-is. There are no safe borders added from the sides, so any sampling from the texture data needs to take the actual bounds into account and force the pixel shader to avoid sampling outside of the allocated rectangle. Failure to do so will result in the artefacts across the image borders.
One way to prevent sampling outside the bounds is to clamp the texture coordinates to the rectangle half a texel inside the allocated region, while making sure that only lod[0] is getting sampled (e.g. with textureLod
). Here is an extract from the image rendering code showing this:
// in VS, having `st0` and `st1` as the original image bounds
vStRect = vec4(min(st0, st1) + half_texel, max(st0, st1) - half_texel);
// in FS, having `st` as the texture coordinate
st = clamp(st, vStRect.xy, vStRect.zw);
Our layers have a fixed size of 1024x1024 (or the screen size, if it's smaller), and after traversing the task tree we can safely allocate the required depth of this layered target.