Posts Tagged ‘video’

Encoding MP4s in the browser

Is this possible?

Given that it’s relatively easy to access a camera and capture frames within a browser, I began wondering it there was a way to encode frames and create a video within the browser as well. I can see a few benefits to doing this, perhaps the biggest being that you can move some very computationally expensive work to front-end, avoiding the need to setup and scale a process to do this server-side.

I searched a bit and first came across Whammy as a potential solution, which take a number of WebP images and creates a WebM video. However, only Chrome will let you easily get data from a canvas element as image/webp (see HTMLCanvasElement.toDataURL docs). The non-easy way is to read the pixel values from the canvas element and encode them as WebP. However, I also couldn’t find any existing JS modules that did this (only a few NodeJS wrappers for the server-side cwebp application) and writing an encoder was a much bigger project that I didn’t want to undertake.

The other option I came across, and used, was ffmpeg.js. This is a really interesting project, it’s a port of ffmpeg via Emscripten to JS code which can be run in browsers that support WebAssembly.

Grabbing frames

My previous post on real-time image processing covers how to setup the video stream, take a snapshot, and render it to a canvas element. To work with ffmpeg.js, you’ll additionally need the frame’s pixels from the canvas element as a JPEG image, represented as bytes in a Uint8Array. This can be done as follows:

var dataUri = canvas.toDataURL("image/jpeg", 1);
var jpegBytes = convertDataURIToBinary(dataUri);

convertDataURIToBinary() is the following method, which will take the data-uri representation of the JPEG data and transform it into a Uint8Array:

function convertDataURIToBinary(dataURI) {
var base64 = dataURI.substring(23);
var raw = window.atob(base64);
var rawLength = raw.length;

var array = new Uint8Array(new ArrayBuffer(rawLength));
for (i = 0; i < rawLength; i++) {
array[i] = raw.charCodeAt(i);
}
return array;
};

FYI, this is just a slight modification of a method I found in this gist.

Note that I did not use PNG images due to an issue in the current version of ffmpeg.js (v3.1.9001).

Working with ffmpeg.js

ffmpeg.js comes with a Web Worker wrapper (ffmpeg-worker-mp4.js), which is really nice as you can run “ffmpeg –whatever” by just posting a message to the worker, and get the status/result via messages posted backed to the caller via Worker.onmessage.

var worker = new Worker("node_modules/ffmpeg.js/ffmpeg-worker-mp4.js");
worker.onmessage =
function (e) {
var msg = e.data;

switch (msg.type) {
case "ready":
console.log(
'mp4 worker ready');
break;
case "stdout":
console.log(msg.data);
break;
case "stderr":
console.log(msg.data);
break;

case "done":
var blob = new Blob([msg.data.MEMFS[0].data], {
type:
"video/mp4"
});

// ...
break;

case "exit":
console.log(
"Process exited with code " + msg.data);
break;
}
};

Input and output of files is handled by MEMFS (one of the virtual file systems supported by Emscripten). On the “done” message from ffmpeg.js, you can access the output files via the msg.data.MEMFS array (shown above). Input files are specified via an array in the call to worker.postMessage (shown below).

worker.postMessage(
{
type:
"run",
TOTAL_MEMORY: 268435456,
MEMFS: [
{
name:
"input.jpeg",
data: jpegBytes
}
],
arguments: [
"-r", "60", "-i", "input.jpeg", "-aspect", "16/9", "-c:v", "libx264", "-crf", "1", "-vf", "scale=1280:720", "-pix_fmt", "yuv420p", "-vb", "20M", "out.mp4"]
}
);

Limitations

With a bunch of frames captured from the video stream, I began pushing them through ffmpeg.js to encode a H.264 MP4 at 720p, and things started to blow up. There were 2 big issues:

  • Video encoding is no doubt a memory intensive operation, but even for a few dozen frames I could never give ffmpeg.js enough. I tried playing around with the TOTAL_MEMORY prop in the worker.postMessage call, but if it’s too low ffmpeg.js runs out of memory and if it’s too high ffmpeg.js fails to allocate memory.
  • Browser support issues. Support issues aren’t surprising here given that WebAssembly is still experimental. The short of it is: things work well in Chrome and Firefox on desktop. For Edge or Chrome on a mobile device, things work for a while before the browser crashes. For iOS there is no support.

Hacking something together

The browser issues were intractable, but support on Chrome and Firefox was good enough more me, and I felt I could work around the memory limitations. Lowering the memory footprint was a matter of either:

  • Reducing the resolution of each frame
  • Reducing the number of frames

I opted for the latter. My plan was to make a small web application to allow someone to easily capture and create time-lapse videos, so I had ffmpeg.js encode just 1 frame to a H.264 MP4, send that MP4 to the server, and then use ffmpeg’s concat demuxer on the server-side to progressively concatenate each individual MP4 file into a single MP4 video. What this enables is for the more costly encoding work to the done client-side and the cheaper concatenation work to be done server-side.

Time Stream was the end result.

Here’s a time-lapse video created using an old laptop and a webcam taped onto my balcony:

This sort of hybrid solution works well. Overall, I’m happy with the results, but would love the eliminate the server-side ffmpeg dependency outright, so I’m looking forward to seeing Web Assembly support expand and improve across browsers.

More generally, it’s interesting to push these types of computationally intensive tasks to the front-end, and I think it presents some interesting possibilities for architecting and scaling web applications.

Reel

I wrote a little desktop application to capture short videos and turn them into GIFs. I call it Reel. It’s still rough around the edges but you can grab an early version of it below.

Reel 0.1 (Windows Install)

I’ll have a Linux/Ubuntu version soon. Maybe an OS X version… I have to jump through a few extra hoops here as Apple still refuses to allow OS X to be virtualized.

Reel - Drinking Bird

Aside from its utility, this was also an experiment piecing together some technologies I’ve written about here before: XUL + XPCOM + SocketBridge, video capture using web tech and, in general, using web technologies for desktop applications.

Real-time image processing on the web

A while ago I began playing around with grabbing a video stream from a webcam and seeing what I could do with the captured data. Capturing the video stream using the navigator.getUserMedia() method was straightforward, but directly reading and writing the image data of the video stream isn’t possible. That said, the stream data can be put onto a canvas using CanvasRenderingContext2D.drawImage(), giving you to ability to read the pixel data. When it comes to writing visual data, a few options are available.

var videoElem = document.querySelector('video');

// Request video stream
navigator.getUserMedia({video: true, audio: false},

function(_localMediaStream) {
videoStream = _localMediaStream;
videoElem.src = window.URL.createObjectURL(_localMediaStream);
},

function(err) {
console.log(
'navigator.getUserMedia error' + err);
}

);
var videoElem = document.querySelector('video');
var canvas = document.querySelector('canvas');
var ctx = canvas.getContext('2d');

...

// put snapshot from video stream into canvas
ctx.drawImage(videoElem, 0, 0);

You can read and write to the <canvas> element, so hiding the <video> element with the source data and just showing the <canvas> element is an option, but the CanvasRenderingContext2D.drawImage() call is expensive; looking at the copied stream on the <canvas> element there is, very noticeable, visual lag. Another reason to avoid this option is that the frequency at which you render (e.g. 30 FPS), isn’t necessarily the frequency at which you’d want to grab and process image data (e.g. 10 FPS). The disassociation allow you to keep the video playback smooth, for a better user experience, but more effectively utilize CPU cycles for the image processing. At least in my experiences so far, a small delay in the visual feedback from the image processing is acceptable and looks perfectly fine intermixed with the higher-frequency video stream.

Throwing aside reading and writing to just the <canvas> element, alternative options all involve showing the <video> element with the webcam stream and placing visual feedback on top of the video pixels. A few ideas:

  • Write pixel data to another canvas and render it on top of the <video> element
  • Render SVG elements on top of the <video> element
  • Render DOM elements (absolutely positioned) on top of the <video> element

The third option is an ugly solution, but it’s fast to code and thus allows for quick prototyping. The demo and code below shows a quick demo I slapped together using <div> elements as markers for hot spots, in this case bright spots, within the video.

<!DOCTYPE html>
<
html>
<
head>
<
title>Webcam Cap</title>
<
meta charset="UTF-8">
<
meta name="viewport" content="width=device-width, initial-scale=1.0">

<
style type="text/css">
* { margin:0; padding:0; border:none; }
</style>

</
head>

<
body>
<
div>
<
video style="width:640px; height:480px;" width="640" height="480" autoplay></video>
<
canvas style="display:none; width:640px; height:480px;" width="640" height="480"></canvas>
</
div>

<
div class="ia-markers"></div>

<
script type="text/javascript">

navigator.getUserMedia = (navigator.getUserMedia || navigator.webkitGetUserMedia || navigator.mozGetUserMedia || navigator.msGetUserMedia);

if ( typeof navigator.getUserMedia !== 'undefined' ) {

var videoElem = document.querySelector('video');
var canvas = document.querySelector('canvas');
var ctx = canvas.getContext('2d');
var videoStream = null;
var snapshotIntv = null;

var width = 640;
var height = 480;

// Request video stream
navigator.getUserMedia({video: true, audio: false},

function(_localMediaStream) {
videoStream = _localMediaStream;
videoElem.src = window.URL.createObjectURL(_localMediaStream);

// Take a snapshot of the video stream 10ms
snapshotIntv = setInterval(function() {
processSnapshot(videoStream);
}, 100);

},

function(err) {
console.log(
'navigator.getUserMedia error' + err);
}

);


// Take a snapshot from the video stream
function processSnapshot() {

// put snapshot from video stream into canvas
ctx.drawImage(videoElem, 0, 0);

// Clear old snapshot markers
var markerSetParent = (document.getElementsByClassName('ia-markers'))[0];
markerSetParent.innerHTML =
'';

// Array to store hotzone points
var hotzones = [];

// Process pixels
var imageData = ctx.getImageData(0, 0, width, height);
for (var y = 0; y < height; y+=16) {
for (var x = 0; x < width; x+=16) {
var index = (x + y * imageData.width) << 2;

var r = imageData.data[index + 0];
var g = imageData.data[index + 1];
var b = imageData.data[index + 2];

if(r > 200 && g > 200 && b > 200) {
hotzones.push([x,y]);
}
}
}

// Add new hotzone elements to DOM
for(var i=0; i<hotzones.length; i++) {
var x = hotzones[i][0];
var y = hotzones[i][1];

var markerDivElem = document.createElement("div");
markerDivElem.setAttribute(
'style', 'position:absolute; width:16px; height:16px; border-radius:8px; background:#0f0; opacity:0.25; left:' + x + 'px; top:' + y + 'px');
markerDivElem.className =
'ia-hotzone-marker';

markerSetParent.appendChild(markerDivElem);
}
}

}
else {
console.log(
'getUserMedia() is not supported in your browser');
}

</script>

</
body>
</
html>

Collateral Murder

The WikiLeaks video showing the unprovoked killing of over a dozen people in the the Iraqi suburb of New Baghdad. Among the dead were 2 Reuters news staff members, Namir Noor-Eldeen and Saeed Chmagh.

The corresponding NYTimes article on the incident.

From the Times article, it’s clear that the military clearly tried to cover to this up,

The American military said in a statement late Thursday that 11 people had been killed: nine insurgents and two civilians. According to the statement, American troops were conducting a raid when they were hit by small-arms fire and rocket-propelled grenades. The American troops called in reinforcements and attack helicopters. In the ensuing fight, the statement said, the two Reuters employees and nine insurgents were killed.

“There is no question that coalition forces were clearly engaged in combat operations against a hostile force,” said Lt. Col. Scott Bleichwehl, a spokesman for the multinational forces in Baghdad.

There was no raid, no small-arms fire, no RPG. In fact, there was no fight, as all the bullets came from the Apache.

This also very much bring into question what the rules of engagement are in Iraq and to what degree they’re being followed, as the debate always seems to be centered on how restrictive the ROE are, whereas, in this case, simply having an AK-47 (not necessarily uncommon in this part of the world) was sufficient cause to engage.

h/t to reddit for being the only site I’ve seen that mentioned the video. CNN only recently picked it up.