To introduce this week’s newsletter I’ll write about culling. Culling refers to discarding invisible content and is performed at several stages of the rendering pipeline. During frame building on the CPU we go through all primitives and discard the ones that are off-screen by computing simple rectangle intersections. As a result we avoid transferring a lot of data to the GPU and we can skip processing them as well.
Unfortunately this isn’t enough. Web page are typically built upon layers and layers of elements stacked on top of one another. The traditional way to render web pages is to draw each element in back-to-front order, which means that for a given pixel on the screen we may have rendered many primitives. This is frustrating because there are a lot of opaque primitives that completely cover the work we did on that pixel for element beneath it, so there is a lot of shading work and memory bandwidth that goes to waste, and memory bandwidth is a very common bottleneck, even on high end hardware.
Drawing on the same pixels multiple times is called overdraw, and overdraw is not our friend, so a lot effort goes into reducing it.
In its early days, to mitigate overdraw WebRender divided the screen in tiles and all primitives were assigned to the tiles they covered (primitives that overlap several tiles would be split into a primitive for each tile), and when an opaque primitive covered an entire tile we could simply discard everything that was below it. This tiling approach was good at reducing overdraw with large occluders and also made the batching blended primitives easier (I’ll talk about batching in another episode). It worked quite well for axis-aligned rectangles which is the vast majority of what web pages are made of, but it was hard to split transformed primitives.
Eventually we decided to try a different approach inspired by how video games tackle the same problem. GPUs have a special feature called the z-buffer (or depth-buffer) into which are stored the depth of each pixel during rendering. This allows rendering opaque objects in any order and still correctly have the ones closest to the camera visible.
A common way to render 3d games is to sort objects front-to-back to maximize the chance that front-most pixels are written first and maximize the amount of shading and memory writes that are discarded by the depth test. Transparent objects are then rendered back-to-front in a second pass since they can’t count as occluders.
This is exactly what WebRender does now, and moving from the tiling scheme to using the depth buffer to reduce overdraw brought great performance improvements (certainly more than I expected), and also made a number of other things simpler (I’ll come back to these another day).
This concludes today’s little piece of WebRender history. It is very unusual for 2D rendering engines to use the z-buffer this way so I think this implementation detail is worth the highlight.
Notable WebRender and Gecko changes
- Bobby implemented dynamically growing the shared texture cache, cutting by half the remaining regression compared to Firefox without WebRender on the AWSY test.
- Dan did some profiling of the talos test, and identified large data structures being copied a lot on the stack, which led to some of Glenn’s optimizations for this week.
- Dan fixed a crash with huge box shadows.
- Kats fixed an issue with scrollbar dragging on some sites.
- Matt landed his tiled blob image work yielding nice performance improvements on some of the talos tests (45% on tsvg_static and 31% on tsvgr_opacity).
- Matt investigated the telemetry results.
- Andrew fixed a crash.
- Andrew improved animated image frame recycling (will land soon, improves performance).
- Lee fixed related to missing font descriptors.
- Glenn optimized out segment builder no-ops.
- Glenn stored text run outside primitive instances to work around a recent performance regression.
- Glenn moved opacity from per primitive instance to a per template to reduce the size of primitive instances.
- Glenn moved the resolution of opacity bindings to the shaders in order to simplify primitive interning.
- Glenn used primitive interning for text runs.
- Glenn used primitive interning for clears.
- Glenn refactored render task chaining.
- Nical made opacity, gaussian blur, and drop shadow SVG filters use WebRender’s filter infrastructure under some conditions (instead of running on the CPU fall back).
- Nical followed up with making a subset of SVG color matrix filters (the one that aren’t affected by opacity) use WebRender as well.
- Nical investigated a color conversion issue when WebRender is embedded in an application that uses OpenGL in certain ways.
- Sotaro switched presentation to triple buffering
- Bobby is further improving memory usage by tweaking cache growth and eviction heuristics.
- Kats continues working on asynchronous zoom for android and on evaluating the remaining work before WebRender can be used on android.
- Kvark is making progress on simplifying the clipping and scrolling APIs.
- Matt keeps investigating performance in general.
- Glenn continues incrementally landing patches towards picture caching.
- Nical is working on blob recoordination.
- Doug is making progress on document splitting.
Enabling WebRender in Firefox Nightly
- config set “gfx.webrender.all” to true,
- restart Firefox.