In previous posts I wrote about how we deal with batching and how we use the depth buffer both as a culling mechanism and as a way to save memory bandwidth. As a result, pixels rendered in the opaque pass are much cheaper than pixels rendered in the blend pass. This works great with rectangular opaque primitives that are axis-aligned so they don’t need anti-aliasing. Anti-aliasing, however, requires us to do some blending to smoothen the edges and rounded corners have some transparent pixels. We could tessellate a mesh that covers exactly the rounded primitive but we’d still need blending for the anti-aliasing of the border. What a shame, rounded corners are so common on the web, and they are often quite big.
Well, we don’t really need to render whole primitives at a time. for a transformed primitive we can always extract out the opaque part of the primitive and render the anti-aliased edges separately. Likewise, we can break rounded rectangles up into smaller opaque rectangles and the rectangles that contain the corners. We call this primitive segmentation and it helps at several levels: opaque segments can move to the opaque pass which means we get good memory bandwidth savings and better batching since batching complexity is mostly affected by the amount of work to perform during the blend pass. This also opens the door to interesting optimizations. For example we can break a primitive into segments, not only depending on the shapes of the primitive itself, but also on the shape of masks that are applied to it. This lets us create large rounded rectangle masks where only the rounded parts of the masks occupy significant amounts of space in the mask. More generally, there are a lot of complicated elements that can be reduced to simpler or more compact segments by applying the same family of tricks and render them as nine-patches or some more elaborate patchwork of segments (for example the box-shadow of a rectangle).
The way we represent this on the GPU is to pack all of the primitive descriptions in a large float texture. For each primitive we first pack the per-primitive data followed by the per-segment data. We dispatch instanced draw calls where each instance corresponds to a segment’s quad. The vertex shader finds all of the information it needs from the primitive offset and segment id of the quad it is working on.
The idea of breaking complicated primitives up into simpler segments isn’t new nor ground breaking, but I think that it is worth mentioning in the context of WebRender because of how well it integrates with the rest of our rendering architecture.