Posts Tagged ‘video’

Pushing computation to the front: video snapshots

Video and the Canvas API

The Canvas API is surprising versatile. The image parameter of the CanvasRenderingContext2D.drawImage() method will accept images from a number of different sources including an HTMLVideoElement. I touched on this a bit in a previous post about processing the data from video streams, however HTMLVideoElement can also handle loading and rendering video files, with all modern browsers capable of tackling the non-trivial tasks of decoding and rendering H.264 MP4 or VP8/VP9 WebM content (and, of course, you get all the benefits of the client’s GPU hardware that the browser takes advantage of). This opens up the possibility of capturing frames from video files which can be used for preview images, poster images, or substituting in an image when video playback isn’t possible (e.g. for a print layout, which is the issue I’ve run into with ScratchGraph).

Setting up the HTMLVideoElement

This is fairly standard, here we’ll load an H.264 MP4 with the filename “test.mp4”:

const video = document.createElement('video'); const videoSource = document.createElement('source'); videoSource.setAttribute('type', 'video/mp4'); videoSource.setAttribute('src', 'test.mp4'); video.appendChild(videoSource);

For reference, here’s the test video:

Next, we want to seek to a point in the video where we want to capture the frame and also bind to an event that’ll tell us when we’re able to read the frame data from the HTMLVideoElement. The seeked event works well. The other potentially viable option is the loadeddata event, but I ran into some issues here, which I’ll describe later.

video.addEventListener('seeked', function(e) { // capture the video frame at the point seeked to... }); // seek to 2s video.currentTime = 2;

Render the frame onto a canvas

The Canvas API makes this really easy and the process mirrors what’s described in the post on thumbnail generation:

/** * * @param {HTMLVideoElement} video * @param {Number} newWidth * @param {Number} newHeight * @param {Boolean} proportionalScale * @returns {Canvas} */ videoFrameToCanvas: function(video, newWidth, newHeight, proportionalScale) { if(proportionalScale) { if(video.videoWidth > video.videoHeight) { newHeight = newHeight * (video.videoHeight / video.videoWidth); } else if(video.height > video.videoWidth) { newWidth = newWidth * (video.videoWidth / video.videoHeight); } else {} } const canvas = document.createElement('canvas'); canvas.width = newWidth; canvas.height = newHeight; const canvasCtx = canvas.getContext('2d'); canvasCtx.drawImage(video, 0, 0, newWidth, newHeight); return canvas; }

I added this method to the canvas-image-transformer library; referencing the method we can now flesh out the seeked event handler. For this test, we’ll also render out what’s on the canvas to an <img> element in the document to see what’s been captured.

video.addEventListener('seeked', function(e) { // capture the video frame at the point seeked to const frameOnCanvas = CanvasImageTransformer.videoFrameToCanvas(video, 500, 500, true); document.getElementById('testImage').src = frameOnCanvas.toDataURL(); });

frameOnCanvas is a canvas with the captured frame, and here’s what it looks like transformed & rendered into an <img> element:

canvas-image-transformer-test-video-frame-capture

Issues

  • Something not immediately obvious is that the seeked event is not fired if video.currentTime = 0 (i.e. you want to seek to the first frame of a video). However, you can use a very small time value (e.g. video.currentTime = 0.000000001), which will typically seek to the first frame in most cases. That said, it is a hacky/non-elegant solution.
  • There are cross-browser issues with the loadeddata event. In Firefox, you will only get a frame capture if you don’t seek. If you do attempt to seek, you’ll get a empty frame and the canvas will have a transparent image. Conversely, in Chrome (and other Webkit-based browsers), you will only get a frame if you do seek. The standard states that the event should be fired when “the user agent can render the media data at the current playback position for the first time” which seem to indicate an implementation flaw in both browsers.
  • The test video was taken on my phone and the frames themselves are upsided-down, this is typical with smartphone videos as it’s expected that playback will take into account metadata indicating orientation. In Firefox, this isn’t taken into account when using CanvasRenderingContext2D.drawImage() with HTMLVideoElement, so you get an upsided-down image on the canvas.

Alternatives & limitations

I couldn’t think of a ton of options for decoding H.264 or VP8/VP9. If you’re looking to create something yourself, a server-side service invoking FFmpeg seems like the best option. I played around with Puppeteer, but Puppeteer comes with Chromium, which lacks the audio and video support you get out-of-the box with Chrome. Although, installing and using Chrome server-side with Puppeteer has potential.

There are also third-party services which can handle video decoding and transcoding, and those are solid server-side options.

As with thumbnail generation, here again we’re looking at workloads that have potential to be moved to the frontend, where you have hardware better suited for graphics work and the possibility of reducing backend complexity. On the other hand, the same limitations comes into play, as you have less control over the execution environment and no clear path for backfill or migration needs.

Encoding MP4s in the browser

Is this possible?

Given that it’s relatively easy to access a camera and capture frames within a browser, I began wondering it there was a way to encode frames and create a video within the browser as well. I can see a few benefits to doing this, perhaps the biggest being that you can move some very computationally expensive work to front-end, avoiding the need to setup and scale a process to do this server-side.

I searched a bit and first came across Whammy as a potential solution, which take a number of WebP images and creates a WebM video. However, only Chrome will let you easily get data from a canvas element as image/webp (see HTMLCanvasElement.toDataURL docs). The non-easy way is to read the pixel values from the canvas element and encode them as WebP. However, I also couldn’t find any existing JS modules that did this (only a few NodeJS wrappers for the server-side cwebp application) and writing an encoder was a much bigger project that I didn’t want to undertake.

The other option I came across, and used, was ffmpeg.js. This is a really interesting project, it’s a port of ffmpeg via Emscripten to JS code which can be run in browsers that support WebAssembly.

Grabbing frames

My previous post on real-time image processing covers how to setup the video stream, take a snapshot, and render it to a canvas element. To work with ffmpeg.js, you’ll additionally need the frame’s pixels from the canvas element as a JPEG image, represented as bytes in a Uint8Array. This can be done as follows:

var dataUri = canvas.toDataURL("image/jpeg", 1);
var jpegBytes = convertDataURIToBinary(dataUri);

convertDataURIToBinary() is the following method, which will take the data-uri representation of the JPEG data and transform it into a Uint8Array:

function convertDataURIToBinary(dataURI) {
var base64 = dataURI.substring(23);
var raw = window.atob(base64);
var rawLength = raw.length;

var array = new Uint8Array(new ArrayBuffer(rawLength));
for (i = 0; i < rawLength; i++) {
array[i] = raw.charCodeAt(i);
}
return array;
};

FYI, this is just a slight modification of a method I found in this gist.

Note that I did not use PNG images due to an issue in the current version of ffmpeg.js (v3.1.9001).

Working with ffmpeg.js

ffmpeg.js comes with a Web Worker wrapper (ffmpeg-worker-mp4.js), which is really nice as you can run “ffmpeg –whatever” by just posting a message to the worker, and get the status/result via messages posted backed to the caller via Worker.onmessage.

var worker = new Worker("node_modules/ffmpeg.js/ffmpeg-worker-mp4.js");
worker.onmessage =
function (e) {
var msg = e.data;

switch (msg.type) {
case "ready":
console.log(
'mp4 worker ready');
break;
case "stdout":
console.log(msg.data);
break;
case "stderr":
console.log(msg.data);
break;

case "done":
var blob = new Blob([msg.data.MEMFS[0].data], {
type:
"video/mp4"
});

// ...
break;

case "exit":
console.log(
"Process exited with code " + msg.data);
break;
}
};

Input and output of files is handled by MEMFS (one of the virtual file systems supported by Emscripten). On the “done” message from ffmpeg.js, you can access the output files via the msg.data.MEMFS array (shown above). Input files are specified via an array in the call to worker.postMessage (shown below).

worker.postMessage(
{
type:
"run",
TOTAL_MEMORY: 268435456,
MEMFS: [
{
name:
"input.jpeg",
data: jpegBytes
}
],
arguments: [
"-r", "60", "-i", "input.jpeg", "-aspect", "16/9", "-c:v", "libx264", "-crf", "1", "-vf", "scale=1280:720", "-pix_fmt", "yuv420p", "-vb", "20M", "out.mp4"]
}
);

Limitations

With a bunch of frames captured from the video stream, I began pushing them through ffmpeg.js to encode a H.264 MP4 at 720p, and things started to blow up. There were 2 big issues:

  • Video encoding is no doubt a memory intensive operation, but even for a few dozen frames I could never give ffmpeg.js enough. I tried playing around with the TOTAL_MEMORY prop in the worker.postMessage call, but if it’s too low ffmpeg.js runs out of memory and if it’s too high ffmpeg.js fails to allocate memory.
  • Browser support issues. Support issues aren’t surprising here given that WebAssembly is still experimental. The short of it is: things work well in Chrome and Firefox on desktop. For Edge or Chrome on a mobile device, things work for a while before the browser crashes. For iOS there is no support.

Hacking something together

The browser issues were intractable, but support on Chrome and Firefox was good enough more me, and I felt I could work around the memory limitations. Lowering the memory footprint was a matter of either:

  • Reducing the resolution of each frame
  • Reducing the number of frames

I opted for the latter. My plan was to make a small web application to allow someone to easily capture and create time-lapse videos, so I had ffmpeg.js encode just 1 frame to a H.264 MP4, send that MP4 to the server, and then use ffmpeg’s concat demuxer on the server-side to progressively concatenate each individual MP4 file into a single MP4 video. What this enables is for the more costly encoding work to the done client-side and the cheaper concatenation work to be done server-side.

Time Stream was the end result.

Here’s a time-lapse video created using an old laptop and a webcam taped onto my balcony:

This sort of hybrid solution works well. Overall, I’m happy with the results, but would love the eliminate the server-side ffmpeg dependency outright, so I’m looking forward to seeing Web Assembly support expand and improve across browsers.

More generally, it’s interesting to push these types of computationally intensive tasks to the front-end, and I think it presents some interesting possibilities for architecting and scaling web applications.

Reel

I wrote a little desktop application to capture short videos and turn them into GIFs. I call it Reel. It’s still rough around the edges but you can grab an early version of it below.

Reel 0.1 (Windows Install)

I’ll have a Linux/Ubuntu version soon. Maybe an OS X version… I have to jump through a few extra hoops here as Apple still refuses to allow OS X to be virtualized.

Reel - Drinking Bird

Aside from its utility, this was also an experiment piecing together some technologies I’ve written about here before: XUL + XPCOM + SocketBridge, video capture using web tech and, in general, using web technologies for desktop applications.

Real-time image processing on the web

A while ago I began playing around with grabbing a video stream from a webcam and seeing what I could do with the captured data. Capturing the video stream using the navigator.getUserMedia() navigator.mediaDevices.getUserMedia() method was straightforward, but directly reading and writing the image data of the video stream isn’t possible. That said, the stream data can be put onto a canvas using CanvasRenderingContext2D.drawImage(), giving you the ability to manipulate the pixel data.

const videoElem = document.querySelector('video'); // Request video stream navigator.mediaDevices.getUserMedia({video: true, audio: false}) .then(function(_videoStream) { // Render video stream on <video> element videoElem.srcObject =_videoStream; }) .catch(function(err) { console.log(`getUserMedia error: ${err}`); } ); const videoElem = document.querySelector('video'); const canvas = document.querySelector('canvas'); const ctx = canvas.getContext('2d'); // put snapshot from video stream into canvas ctx.drawImage(videoElem, 0, 0);

You can read and write to the <canvas> element, so hiding the <video> element with the source data and just showing the <canvas> element seems logical, but the CanvasRenderingContext2D.drawImage() call is expensive; looking at the copied stream on the <canvas> element there is, very noticeable, visual lag. Another reason to avoid this option is that the frequency at which you render (e.g. 30 FPS), isn’t necessarily the frequency at which you’d want to grab and process image data (e.g. 10 FPS). The disassociation allows you to keep the video playback smooth, for a better user experience, but more effectively utilize CPU cycles for image processing. At least in my experience so far, a small delay in visual feedback from image processing is acceptable and looks perfectly fine intermixed with the higher-frequency video stream.

So the best options all seem to involve showing the <video> element with the webcam stream and placing visual feedback on top of the video in some way. A few ideas:

  • Write pixel data to another canvas and render it on top of the <video> element
  • Render SVG elements on top of the <video> element
  • Render DOM elements (absolutely positioned) on top of the <video> element

The third option is an ugly solution, but it’s fast to code and thus allows for quick prototyping. The demo and code below shows a quick demo I slapped together using <div> elements as markers for hotspots, in this case bright spots, within the video.

Here’s the code for the above demo:

<!DOCTYPE html> <html> <head> <title>Webcam Cap</title> <meta charset="UTF-8"> <meta name="viewport" content="width=device-width, initial-scale=1.0"> <style type="text/css"> * { margin:0; padding:0; border:none; overflow:hidden; } </style> </head> <body> <div> <video style="width:640px; height:480px;" width="640" height="480" autoplay></video> <canvas style="display:none; width:640px; height:480px;" width="640" height="480"></canvas> </div> <div class="ia-markers"></div> <script type="text/javascript"> const videoElem = document.querySelector('video'); const canvas = document.querySelector('canvas'); const ctx = canvas.getContext('2d'); var snapshotIntv = null; const width = 640; const height = 480; // Request video stream navigator.mediaDevices.getUserMedia({video: true, audio: false}) .then(function(_videoStream) { // Render video stream on <video> element videoElem.srcObject =_videoStream; // Take a snapshot of the video stream 10ms snapshotIntv = setInterval(function() { processSnapshot(_videoStream); }, 100); }) .catch(function(err) { console.log(`getUserMedia error: ${err}`); } ); // Take a snapshot from the video stream function processSnapshot() { // put snapshot from video stream into canvas ctx.drawImage(videoElem, 0, 0); // Clear old snapshot markers var markerSetParent = (document.getElementsByClassName('ia-markers'))[0]; markerSetParent.innerHTML = ''; // Array to store hotzone points var hotzones = []; // Process pixels var imageData = ctx.getImageData(0, 0, width, height); for (var y = 0; y < height; y+=16) { for (var x = 0; x < width; x+=16) { var index = (x + y * imageData.width) << 2; var r = imageData.data[index + 0]; var g = imageData.data[index + 1]; var b = imageData.data[index + 2]; if(r > 200 && g > 200 && b > 200) { hotzones.push([x,y]); } } } // Add new hotzone elements to DOM for(var i=0; i<hotzones.length; i++) { var x = hotzones[i][0]; var y = hotzones[i][1]; var markerDivElem = document.createElement("div"); markerDivElem.setAttribute('style', 'position:absolute; width:16px; height:16px; border-radius:8px; background:#0f0; opacity:0.25; left:' + x + 'px; top:' + y + 'px'); markerDivElem.className = 'ia-hotzone-marker'; markerSetParent.appendChild(markerDivElem); } } </script> </body> </html>

Edit (8/1/2020): The code has been updated to reflect changes in the MediaDevices API. This includes:

  • navigator.getUserMedianavigator.mediaDevices.getUserMedia. The code structure is slightly different given that the latter returns a promise.
  • Assigning the media stream to a video element directly via the srcObject attribute. This is now required in most modern browsers as the old way of using createObjectURL on the stream and assigning the returned URL to the video element’s src attribute is no longer supported.

In addition, there’s also just some general code cleanup to modernize the code and make it a little easier to read. Some of the language in the post has also been tweaked to make things clearer.