WebRender newsletter #31

Greetings! I’ll introduce WebRender’s 31st newsletter with a few words about batching.

Efficiently submitting work to GPUs isn’t as straightforward as one might think. It is not unusual for a CPU renderer to go through each graphic primitive (a blue filled circle, a purple stroked path, and image, etc.) in z-order to produce the final rendered image. While this isn’t the most efficient way, greater care needs to be taken in optimizing the inner loop of the algorithm that renders each individual object than in optimizing the overhead of alternating between various types of primitives. GPUs however, work quite differently, and the cost of submitting small workloads is often higher than the time spent executing them.

I won’t go into the details of why GPUs work this way here, but the big takeaway is that it is best to not think of a GPU API draw call as a way to draw one thing, but rather as a way to submit as many items of the same type as possible. If we implement a shader to draw images, we get much better performance out of drawing many images in a single draw call than submitting a draw call for each image. I’ll call a “batch” any group of items that is rendered with a single drawing command.

So the solution is simply to render all images in a draw call, and then all of the text, then all gradients, right? Well, it’s a tad more complicated because the submission order affects the result. We don’t want a gradient to overwrite text that is supposed to be rendered on top of it, so we have to maintain some guarantees about the order of the submissions for overlapping items.

In the 29th newsletter intro I talked about culling and the way we used to split the screen into tiles to accelerate discarding hidden primitives. This tiling system was also good at simplifying the problem of batching. In order to batch two primitives together we need to make sure that there is no primitive of a different type in between. Comparing all primitives on screen against every other primitive would be too expensive but the tiling scheme reduced this complexity a lot (we then only needed to compare primitives assigned to the same tile).

In the culling episode I also wrote that we removed the screen space tiling in favor of using the depth buffer for culling. This might sound like a regression for the performance of the batching code, but the depth buffer also introduced a very nice property: opaque elements can be drawn in any order without affecting correctness! This is because we store the z-index of each pixel in the depth buffer, so if some text is hidden by an opaque image we can still render the image before the text and the GPU will be configured to automatically discard the pixels of the text that are covered by the image.

In WebRender this means we were able to separate primitives in two groups: the opaque ones, and the ones that need to perform some blending. Batching opaque items is trivial since we are free to just put all opaque items of the same type in their own batch regardless of their painting order. For blended primitives we still need to check for overlaps but we have less primitives to consider. Currently WebRender simply iterates over the last 10 blended primitives to see if there is a suitable batch with no other type of primitive overlapping in between and defaults to starting a new batch. We could go for a more elaborate strategy but this has turned out to work well so far since we put a lot more effort into moving as many primitives as possible into the opaque passes.

In another episode I’ll describe how we pushed this one step further and made it possible to segment primitives into the opaque and non-opaque parts and further reduce the amount of blended pixels.

Notable WebRender and Gecko changes

  • Henrik added reftests for the ImageRendering propertiy: (1), (2) and (3).
  • Bobby changed the way pref changes propagate through WebRender.
  • Bobby improved the texture cache debug view.
  • Bobby improved the texture cache eviction heuristics.
  • Chris fixed the way WebRender activation interacts with the progressive feature rollout system.
  • Chris added a marionette test running on a VM with a GPU.
  • Kats and Jamie experimented with various solution to a driver bug on some Adreno GPUs.
  • Kvark removed some smuggling of clipIds for clip and reference frame items in Gecko’s displaylist building code. This fixed a few existing bugs: (1), (2) and (3).
  • Matt and Jeff added some new telemetry and analyzed the results.
  • Matt added a debugging indicator that moves when a frame takes too long to help with catching performance issues.
  • Andrew landed his work on surface sharing for animated images which fixed most of the outstanding performance issues related to animated images.
  • Andrew completed animated image frame recycling.
  • Lee fixed a bug with sub-pixel glyph positioning.
  • Glenn fixed a crash.
  • Glenn fixed another crash.
  • Glenn reduced the need for full world rect calculation during culling to make it easier to do clustered culling.
  • Nical switched device coordinates to signed integers in order to be able to meet some of the needs of blob image recoordination and simplify some code.
  • Nical added some debug checks to avoid global OpenGL states from the embedder causing issues in WebRender.
  • Sotaro fixed an intermittent timeout.
  • Sotaro fixed a crash on Android related to SurfaceTexture.
  • Sotaro improved the frame synchronization code.
  • Sotaro cleaned up the frame synchronization code some more.
  • Timothy ported img AltFeedback items to WebRender.

Ongoing work

  • Glenn is about to land a major piece of his tiled picture caching work which should solve a lot of the remaining issues with pages that generate too much GPU work.
  • Matt, Dan and Jeff keep investigating into CPU performance, a lot which revolving around many memory copies generated by rustc when moving structures on the stack.
  • Doug is investigating talos performance issues with document splitting.
  • Nical is making progress on improving the invalidation of tiled blob images during scrolling.
  • Kvark keeps catching smugglers in gecko’s displaylist building code.
  • Kats and Jamie are hunting driver bugs on Android.

Enabling WebRender in Firefox Nightly 

In about:config, set the pref “gfx.webrender.all” to true and restart the browser.

Reporting bugs

The best place to report bugs related to WebRender in Firefox is the Graphics :: WebRender component in bugzilla.
Note that it is possible to log in with a github account.

2 thoughts on “WebRender newsletter #31

  1. Here are some WR questions I have that would maybe make good future topics for your posts:

    1. Is the interning work Glenn is doing related to picture caching?

    How does picture caching across display lists compare to retaining the display list? Does the WR path even use the retained dl stuff?

    2. How do the strategies for OMTP and WebRender relate? Would OMTP have benefits for expensive blob rasterization since that used Skia?

    3. What is blob tiling and what does it offer above normal blob rendering?

    4. How do APZ and async scene building tie together?

    5. Is there a bug to watch some of the document splitting work going on? My understanding is that document splitting will make the chrome more resilient against slow scene builds in the content frame? Is this right? How does this compare to push_iframe in the DL.

    6. OMTA for color, gradients, etc? How much more of CSS can be feasibly calculated off thread and fed to WR using its Property Binding infra?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s