WebRender: Culling

Culling refers to discarding invisible content and is performed at several stages of the rendering pipeline. During frame building on the CPU we go through all primitives and discard the ones that are off-screen by computing simple rectangle intersections. As a result we avoid transferring a lot of data to the GPU and we can skip processing them as well.

Unfortunately this isn’t enough. Web page are typically built upon layers and layers of elements stacked on top of one another. The traditional way to render web pages is to draw each element in back-to-front order, which means that for a given pixel on the screen we may have rendered many primitives. This is frustrating because there are a lot of opaque primitives that completely cover the work we did on that pixel for element beneath it, so there is a lot of shading work and memory bandwidth that goes to waste, and memory bandwidth is a very common bottleneck, even on high end hardware.

Drawing on the same pixels multiple times is called overdraw, and overdraw is not our friend, so a lot effort goes into reducing it.
In its early days, to mitigate overdraw WebRender divided the screen in tiles and all primitives were assigned to the tiles they covered (primitives that overlap several tiles would be split into a primitive for each tile), and when an opaque primitive covered an entire tile we could simply discard everything that was below it. This tiling approach was good at reducing overdraw with large occluders and also made the batching blended primitives easier (I’ll talk about batching in another episode). It worked quite well for axis-aligned rectangles which is the vast majority of what web pages are made of, but it was hard to split transformed primitives.

Eventually we decided to try a different approach inspired by how video games tackle the same problem. GPUs have a special feature called the z-buffer (or depth-buffer) into which are stored the depth of each pixel during rendering. This allows rendering opaque objects in any order and still correctly have the ones closest to the camera visible.

A common way to render 3d games is to sort objects front-to-back to maximize the chance that front-most pixels are written first and maximize the amount of shading and memory writes that are discarded by the depth test. Transparent objects are then rendered back-to-front in a second pass since they can’t count as occluders.
This is exactly what WebRender does now, and moving from the tiling scheme to using the depth buffer to reduce overdraw brought great performance improvements (certainly more than I expected), and also made a number of other things simpler (I’ll come back to these another day).

This concludes today’s little piece of WebRender history. It is very unusual for 2D rendering engines to use the z-buffer this way so I think this implementation detail is worth the highlight.

Leave a comment