## Posts Tagged ‘chrome’

### A look at 2D vs WebGL canvas performance

I did some quick benchmarking with canvas-image-transformer, looking at the performance between directly manipulating pixels on a 2D canvas versus using a fragment shader on a WebGL canvas. For testing, I used a grayscale transformation as it can be done with a simple weighted sum (`R*0.2126 + G*0.7152 + B*0.0722`) and there’s a high degree of parity between the fragment shader code and the code for pixel operations on a 2D canvas.

#### Converting to grayscale

Pixel operations on the 2D canvas are as follows:

```for(var i=0; i<pixels.data.length; i+=4) { var grayPixel = parseInt(((0.2126*(pixels.data[i]/255.0)) + (0.7152*(pixels.data[i+1]/255.0)) + (0.0722*(pixels.data[i+2]/255.0))) * 255.0); pixels.data[i] = grayPixel; pixels.data[i + 1] = grayPixel; pixels.data[i + 2] = grayPixel; } ```

The corresponding fragment shader for the WebGL canvas is as follows:

```precision mediump float; uniform sampler2D uSampler; varying vec2 vTextureCoord; void main(void) { vec4 src = texture2D( uSampler, ( vTextureCoord ) ); float grayPx = src.r*0.2126 + src.g*0.7152 + src.b*0.0722; gl_FragColor = vec4(grayPx, grayPx, grayPx, 1); }```

#### Performance comparisons in Chrome

Here’s the setup for comparing performance of the 2 method:

• Input was a 3864×3864 image of the Crab Nebula, rendered onto a 2D canvas (note that time to render onto the 2D canvas is not considered in the data points below)
• Output is the 2D canvas that the input image was render on
• CPU was an AMD Ryzen 7 5700X
• GPU was a RTX 2060
• OS is Windows 10 Build 19044
• Browser is Chrome 108.0.5359.125
• Hard refresh on page load to bypass any browser-level caching
• Transformation via WebGL approach for 25 iterations
• Transformation via 2D canvas approach for 25 iterations

Visually, this is what’s being done:

I tried to eliminate as much background noise as possible from the result; that is, eliminating anything that may have a impact on CPU or GPU usage: closing other applications that may have significant usage, not having any other tabs open in the browser, and not having DevTools open when image processing was being done. That said, I was not rigorous about this and the numbers presented are to show overall/high-level behavior and performance; they’re not necessarily representative of what peak performance would be on the machine or browser.

It’s also worth noting that canvas-image-transformer doesn’t attempt to do any sort of caching in the first iteration (i.e. textures are re-created, shaders are re-compiled, etc. on each iteration), so we shouldn’t expect large variances in performance from one iteration to the next.

Graphing the data points for each approach, for each iteration, I got the following (note that what’s presented is just the data for 1 test run; I did test multiple times and consistently saw the same behavior but, for simplicity, I just graphed the values from 1 test run):

So, the data points for the first iteration are interesting.

• On the 2d canvas, the transformation initially takes 371.8ms
• On the webgl2, the transformation initially takes 506.5ms

That’s a massive gap in performance between the 2 methods, with the 2d canvas method being significantly faster. I would have expected the WebGL approach to be faster here as, generally, graphics-related things would be faster with a lower-level GPU interface, but that’s clearly not the case here.

For subsequent iterations, we can see that performance improves and normalizes for both approaches, with significantly better performance using the WebGL approach; however, why don’t we see this sort of performance during the first iteration? Profiling the code, I noticed I was consistently seeing the majority of execution time spent on texImage2D() during the first iteration:

`gl.texImage2D(gl.TEXTURE_2D, 0, gl.RGBA8, gl.RGBA, gl.UNSIGNED_BYTE, srcCanvas);`

Looking at the execution time of `texImage2D()` across iterations, we get the following:

We see 486.9ms spent in `texImage2D()` during the first iteration but then execution time drops to only ~0.1ms in subsequent iterations. It’s clear that loading data into a texture is the most costly operation on the first iteration, however it looks like there’s some sort of caching mechanism, likely in Chrome’s GPU component, that essentially eliminates this cost on subsequent iterations.

In an attempt to optimize the call in the first iteration, I briefly looked into potential optimizations to the `texImage2D()` call but didn’t find much. There’s no mipmap creation or doing any sort of format conversion here, so we’re just bound by how quickly we can get the pixels into VRAM.

#### Normal refresh (after previous page load)

There’s a bit more nuance here that’s worth touching on. Looking at just the first iteration in Chrome, after normal/soft refreshes, we see some interesting behavior:

• For the 2d canvas, the first iteration transformation times look the same as when doing a hard refresh
• For the WebGL canvas, we’re getting the transformation times we saw after the first iteration when doing a soft refresh!

It looks like Chrome’s texture caching mechanism is in play and preserves cache entries across soft page refreshes.

#### What about Firefox and other browsers?

I would expect most Webkit-based browsers would have similar behavior to what’s in Chrome and some quick testing in Edge confirms this.

Firefox is a different beast. Testing in Firefox 108.0.2, we see the following transformation times:

Performance, overall, is much more consistent than in Chrome, but not always better.

• For the 2d canvas method, performance is simply worse; on the first iteration we see transformations take 150+ milliseconds more than in Chrome, and on subsequent iterations the performance gap is even wider.
• For the WebGL method, our first iteration performance is significantly better than Chrome, reduced by more than 175 milliseconds. However, on subsequent iterations we don’t see the drastic performance improvement we see in Chrome.

For the 2d canvas method, it’s hard to say why it performs so differently than Chrome. However, for the WebGL method, a bit of profiling led to some interesting insights. In Firefox, the execution time of `texImage2D()` is consistent across iterations, hovering ~40ms; this means it performs significantly better than Chrome’s worst case (first iteration) and significantly worse than Chrome’s best case (non-first iteration where execution time is below 0.1ms), as shown below.

The other significant aspect to Firefox’s performance is in the performance of the Canvas drawImage() call, in drawing from a WebGL canvas to a 2D canvas. At the tail end of the transformation process, canvas-image-transformer does the following:

```const srcCtx = srcCanvas.getContext('2d'); srcCtx.drawImage(glCanvas, 0, 0, srcCanvas.width, srcCanvas.height);```

Basically, it’s taking what’s on the WebGL canvas and writing it out to the input/source canvas, which is a 2D canvas. In Chrome this is a very fast operation, typically less that 2ms, in Firefox I see this typically going above 200ms.

#### Firefox consistency

Finally, looking at transformation times across soft refreshes, we see Firefox performance is very consistent for both the 2D canvas and WebGL method:

However, I did encounter a case where WebGL performance was more erratic. This was testing when I had a lot of tabs open and I suspect there was some contention for GPU resources.

#### Takeaways

There’s perhaps a number of small insights here depending on use-case and audience, but there’s 2 significant high-level takeaways for me:

• GPUs are very fast at parallel processing but loading data to be processed and retrieving the processed data can be expensive operations
• It’s worthwhile to measure things; I was fairly surprised by the different performance profiles between Firefox and Chrome

### Performance visibility with HTTP Server-Timing

Visibility into the performance of backend components can be invaluable when it comes to spotting and understanding service degradation, debugging failures, and knowing if and where optimization is needed. There’s a host of collection agents, aggregators, and visualization tools to handle metrics, but just breaking down and looking at what happens during an HTTP request can offer a lot of insight into how components are performing. This is why I’m pretty excited about the the HTTP Server-Timing header, it works well as a lightweight mechanism to surface performance metrics, especially now that it’s read and graphed by Chrome Devtools (and, perhaps sometime soon, by Firefox Devtools as well).

#### An HTTP response with the Server-Timing header

The following code snippet shows an Illuminate/Http/Response from a controller that PUTs an image into an Amazon S3 bucket.

```return response() ->json( [], StatusCode::STATUS_OK, [ 'Server-Timing' => 's3-io;desc="Image upload to S3";dur=' . calculateTimeToPut(), ] ); ```

Let’s assume the calculateTimeToPut() function returns 5500 (i.e. 5500 milliseconds to PUT the image onto S3), and the response header looks something like this:

Each metric is a group composed of 3 pieces, with each piece delimited by a semicolon:

• Metric Name (required)
• Metric Description
• Metric Value

Multiple metrics can be surfaced by separating each group with a comma.

```return response() ->json( [], StatusCode::STATUS_OK, [ 'Server-Timing' => 's3-io;desc="Image upload to S3";dur=' . calculateTimeToPut() . ',' . 'db-io;desc="DB update of entity";dur=' . calculateTimeToUpdate() ] ); ```

(The above code is a bit simplistic, you’d likely want to better way to store and group metrics, then do a final transformation to construct the Server-Timing string when it’s time to send the HTTP response)

#### Surfacing in DevTools

Surfacing metrics in an HTTP response is not something terribly complex and I’m sure most could devise other ways to do it, but one reason Server-Timing is a bit more attractive vs a custom solution is the out-of-the-box support within Chrome DevTools.

Firefox Devtools will likely follow suit (hopefully?) in the near future.

#### The PerformanceServerTiming interface

Server-Timing metrics can also be surfaced via the PerformanceServerTiming interface, from MDN:

In addition to having Server-Timing header metrics appear in the developer tools of the browser, the PerformanceServerTiming interface enables tools to automatically collect and process metrics from JavaScript.

This opens up some interesting possibilities as it enables collecting metrics via a frontend script (as is already done for a lot of product metrics via services like Google Analytics), rather than a backend collector mechanism. While not ground-breaking, the standardization around PerformanceServerTiming may allow for greater adoption and acceptance of this collection pattern.

Client-side timeouts on XHR requests isn’t something I’ve ever thought a whole lot about. The default is no timeout and in most cases, where you’re kicking off an XHR request in response to a user interaction, you probably won’t ever notice an issue. That said, I ran into a case with ScratchGraph on Chrome where not having a timeout specified, along with some client-side network errors, left the application in a state where it was unable to send any more XHR requests.

ScratchGraph continuously polls its server for new data and every so often I would notice that the XHR calls would stop, with the application left in a broken state, unable to make any AJAX calls. This typically (but not always) occurred when the machine woke up from being put to sleep and in the console there would be a few error messages, typically a number of ERR_NETWORK_IO_SUSPENDED and ERR_INTERNET_DISCONNECTED errors. Testing within my development environment, it was impossible to reproduce. Finally, I came across this StackOverflow post that pointed out that not having a timeout specified on the XHR calls would result in these errors.

I’m still not exactly sure of the interplay between Chrome, the XHR requests, and the network state that results in this situation, but since adding a timeout, I’ve yet to notice this behavior again. It’s also worth noting that it’s very simple to add a timeout on an XHR request:

`var xhr = new XMLHttpRequest();xhr.open('GET', '/hello', true);xhr.timeout = 500; // time in milliseconds `

### Fixed (non-resizable) windows with XULRunner

I’ve been working a bit with XULRunner lately and wanted to create a fixed, non-resizable application window. After some searching, I eventually stumbled upon some code which led to a solution – adding a function call to the application’s prefs.js file:

`pref("toolkit.defaultChromeFeatures", "chrome,resizable=no,dialog=no"); `

resizable=no prevents the window from being resized, dialog=no makes it a non-dialog window so that you can still minimize it.

Simple stuff, but this was difficult to find. Discovered thanks to this post on glazman.org

Note that after some testing with XULRunner on Ubuntu, it appears that this may be a Windows-only setting.