Encoding MP4s in the browser

Is this possible?

Given that it’s relatively easy to access a camera and capture frames within a browser, I began wondering it there was a way to encode frames and create a video within the browser as well. I can see a few benefits to doing this, perhaps the biggest being that you can move some very computationally expensive work to front-end, avoiding the need to setup and scale a process to do this server-side.

I searched a bit and first came across Whammy as a potential solution, which take a number of WebP images and creates a WebM video. However, only Chrome will let you easily get data from a canvas element as image/webp (see HTMLCanvasElement.toDataURL docs). The non-easy way is to read the pixel values from the canvas element and encode them as WebP. However, I also couldn’t find any existing JS modules that did this (only a few NodeJS wrappers for the server-side cwebp application) and writing an encoder was a much bigger project that I didn’t want to undertake.

The other option I came across, and used, was ffmpeg.js. This is a really interesting project, it’s a port of ffmpeg via Emscripten to JS code which can be run in browsers that support WebAssembly.

Grabbing frames

My previous post on real-time image processing covers how to setup the video stream, take a snapshot, and render it to a canvas element. To work with ffmpeg.js, you’ll additionally need the frame’s pixels from the canvas element as a JPEG image, represented as bytes in a Uint8Array. This can be done as follows:

var dataUri = canvas.toDataURL("image/jpeg", 1);
var jpegBytes = convertDataURIToBinary(dataUri);

convertDataURIToBinary() is the following method, which will take the data-uri representation of the JPEG data and transform it into a Uint8Array:

function convertDataURIToBinary(dataURI) {
var base64 = dataURI.substring(23);
var raw = window.atob(base64);
var rawLength = raw.length;

var array = new Uint8Array(new ArrayBuffer(rawLength));
for (i = 0; i < rawLength; i++) {
array[i] = raw.charCodeAt(i);
}
return array;
};

FYI, this is just a slight modification of a method I found in this gist.

Note that I did not use PNG images due to an issue in the current version of ffmpeg.js (v3.1.9001).

Working with ffmpeg.js

ffmpeg.js comes with a Web Worker wrapper (ffmpeg-worker-mp4.js), which is really nice as you can run “ffmpeg –whatever” by just posting a message to the worker, and get the status/result via messages posted backed to the caller via Worker.onmessage.

var worker = new Worker("node_modules/ffmpeg.js/ffmpeg-worker-mp4.js");
worker.onmessage =
function (e) {
var msg = e.data;

switch (msg.type) {
case "ready":
console.log(
'mp4 worker ready');
break;
case "stdout":
console.log(msg.data);
break;
case "stderr":
console.log(msg.data);
break;

case "done":
var blob = new Blob([msg.data.MEMFS[0].data], {
type:
"video/mp4"
});

// ...
break;

case "exit":
console.log(
"Process exited with code " + msg.data);
break;
}
};

Input and output of files is handled by MEMFS (one of the virtual file systems supported by Emscripten). On the “done” message from ffmpeg.js, you can access the output files via the msg.data.MEMFS array (shown above). Input files are specified via an array in the call to worker.postMessage (shown below).

worker.postMessage(
{
type:
"run",
TOTAL_MEMORY: 268435456,
MEMFS: [
{
name:
"input.jpeg",
data: jpegBytes
}
],
arguments: [
"-r", "60", "-i", "input.jpeg", "-aspect", "16/9", "-c:v", "libx264", "-crf", "1", "-vf", "scale=1280:720", "-pix_fmt", "yuv420p", "-vb", "20M", "out.mp4"]
}
);

Limitations

With a bunch of frames captured from the video stream, I began pushing them through ffmpeg.js to encode a H.264 MP4 at 720p, and things started to blow up. There were 2 big issues:

  • Video encoding is no doubt a memory intensive operation, but even for a few dozen frames I could never give ffmpeg.js enough. I tried playing around with the TOTAL_MEMORY prop in the worker.postMessage call, but if it’s too low ffmpeg.js runs out of memory and if it’s too high ffmpeg.js fails to allocate memory.
  • Browser support issues. Support issues aren’t surprising here given that WebAssembly is still experimental. The short of it is: things work well in Chrome and Firefox on desktop. For Edge or Chrome on a mobile device, things work for a while before the browser crashes. For iOS there is no support.

Hacking something together

The browser issues were intractable, but support on Chrome and Firefox was good enough more me, and I felt I could work around the memory limitations. Lowering the memory footprint was a matter of either:

  • Reducing the resolution of each frame
  • Reducing the number of frames

I opted for the latter. My plan was to make a small web application to allow someone to easily capture and create time-lapse videos, so I had ffmpeg.js encode just 1 frame to a H.264 MP4, send that MP4 to the server, and then use ffmpeg’s concat demuxer on the server-side to progressively concatenate each individual MP4 file into a single MP4 video. What this enables is for the more costly encoding work to the done client-side and the cheaper concatenation work to be done server-side.

Time Stream was the end result.

Here’s a time-lapse video created using an old laptop and a webcam taped onto my balcony:

This sort of hybrid solution works well. Overall, I’m happy with the results, but would love the eliminate the server-side ffmpeg dependency outright, so I’m looking forward to seeing Web Assembly support expand and improve across browsers.

More generally, it’s interesting to push these types of computationally intensive tasks to the front-end, and I think it presents some interesting possibilities for architecting and scaling web applications.

Null

One of my favorite videos is Null Island from Minute Earth. I frequently link to it when I get into a discussion about whether null is an acceptable value for a certain use case.

What I really like is the definition around null and the focus of null having a concrete definition, that is: “we don’t know”. When used in this way, we have a clear understanding of what null is and the context in which it’s used (whether some field in a relational table, JSON object, value object, etc.) inherits this definition (i.e. “it’s either this value or we don’t know”), yielding something that’s fairly easy to reason about.

When nulls are ill-defined or have multiple definitions, complexity and confusion grow. Null is not:

  • Zero
  • Empty set
  • Empty string
  • Invalid value
  • A flag value for an error

Equating null to any of the above means that if you come across a null, you need to dig deeper into your code or database to figure out what that null actually means.

The flip side of this is avoiding nulls altogether, and there are really 2 cases here:

  • There is no need for null (i.e. we do know what the value is, in every use case)
  • Architect the system such that a null isn’t surfaced

In the first case, null doesn’t fit the use case, so there’s no need for it. When possible, this is ideal, and you avoid the necessity for null checks.

For the second case, architecting this way always seems to involve adding more complexity, to the point where it’s questionable if there’s a net benefit.

Brute-force convex hull construction

I’ve been experimenting a bit with convex hull constructions and below I’ll explain how to do a brute-force construction of a hull.

It’s worth noting up-front that the brute-force method is slow, O(n3) worst case complexity. So why bother? I think there are a few compelling reasons:

  • The brute-force method expresses the fundamental solution, which gives you the basic building blocks and understanding to approach more complex solutions
  • It’s faster to implement
  • It’s still a viable solution when n is small, and n is usually small.

What is a convex hull?

You can find a formal definition on Wikipedia. Informally, and specific to computational geometry, the convex hull is a convex polygon in which all points are either vertices of said polygon or enclosed within the polygon.

Brute-force construction

  • Iterate over every pair of points (p,q)
  • If all the other points are to the right (or left, depending on implementation) of the line formed by (p,q), the segment (p,q) is part of our result set (i.e. it’s part of the convex hull)

Here’s the top-level code that handles the iteration and construction of resulting line segments:

/**
* Compute convex hull
*/
var computeConvexHull = function() {
console.log(
"--- ");

for(var i=0; i<points.length; i++) {
for(var j=0; j<points.length; j++) {
if(i === j) {
continue;
}

var ptI = points[i];
var ptJ = points[j];

// Do all other points lie within the half-plane to the right
var allPointsOnTheRight = true;
for(var k=0; k<points.length; k++) {
if(k === i || k === j) {
continue;
}

var d = whichSideOfLine(ptI, ptJ, points[k]);
if(d < 0) {
allPointsOnTheRight =
false;
break;
}
}

if(allPointsOnTheRight) {
console.log(
"segment " + i + " to " + j);
var pointAScreen = cartToScreen(ptI, getDocumentWidth(), getDocumentHeight());
var pointBScreen = cartToScreen(ptJ, getDocumentWidth(), getDocumentHeight());
drawLineSegment(pointAScreen, pointBScreen);
}

}
}
};

The “secret sauce” is the whichSideOfLine() method:

/**
* Determine which side of a line a given point is on
*/
var whichSideOfLine = function(lineEndptA, lineEndptB, ptSubject) {
return (ptSubject.x - lineEndptA.x) * (lineEndptB.y - lineEndptA.y) - (ptSubject.y - lineEndptA.y) * (lineEndptB.x - lineEndptA.x);
};

This is a bit of linear algebra derived from the general equation for a line.

The result represents the side of a line a point is one, based on the sign of the result. We can check if the point is on the left or on the right, it doesn’t matter as long as there is consistency and the same check is done for all points.

How it looks

I made a few diagrams to show the first few steps in the algorithm, as segments constituting the convex hull are found. The shaded area represents our success case, where all other points are to the right of the line formed by the points under consideration. Not shown are the failure cases (i.e. one or more points are on the left of the line formed by the points under consideration).

convex hull construction, brute force, step 1

convex hull construction, brute force, step 1

convex hull construction, brute force, step 1

Code and Demo

You can play around with constructing a hull below by double-clicking to add vertices.

You can find the code on GitHub.

Dynamic typing in the long run

There’s a lot written around static vs dynamic types. More and more, I’ve tended to favor static typing; I like compile-time checks and static analysis, I like the strong contracts established between caller and callee, I like the readability of knowing what a function expects and returns from looking at its signature, and I like the refactoring capabilities that IDEs can bring forward from being able to trace references at compile-time. Even with my bias, I’ve never worked on a codebase that’s lived for 10 years, and I found The Long-Term Problem With Dynamically Typed Languages to be an interesting perspective.

Unit testing to prove correctness:

…relying on a giant test suite and test infrastructure to prove the correctness of renaming a function or adding a parameter, in practice, is a significant coefficient of friction on the software’s ability to evolve over time.

Broken windows:

Not improving core APIs results in a kind of broken windows effect. When APIs are confusing or vague, people tend not to notice other confusing or vague APIs, and it slows everyone down in the long term.

Thus, instead of easily refactoring the legacy APIs, people think “I’ll just make a new one and migrate the code over!” And now you have two hard-to-change APIs. And then three. And the cycle continues. Additionally, this cycle is fed by architect types who know or think they know a better way to do things, but can’t be bothered to update the old systems.

Cost of change:

…a type system flattens the cost of change curve. Small API or performance improvements that otherwise wouldn’t be worth it suddenly are, because the compiler can quickly tell you all the places that need to be updated.

Data flow and mental understanding

…the most important component of understanding software is grasping data flow. Programs exist to transform data, and understanding how that’s done is paramount. Types accelerate the process of building a mental understanding of the program, especially when lightweight types such as CustomerId (vs. Int) are used.

Cost of change is interesting, as the flexibility of dynamically-typed languages is usually viewed as beneficial and yielding shorter development times.

Reel

I wrote a little desktop application to capture short videos and turn them into GIFs. I call it Reel. It’s still rough around the edges but you can grab an early version of it below.

Reel 0.1 (Windows Install)

I’ll have a Linux/Ubuntu version soon. Maybe an OS X version… I have to jump through a few extra hoops here as Apple still refuses to allow OS X to be virtualized.

Reel - Drinking Bird

Aside from its utility, this was also an experiment piecing together some technologies I’ve written about here before: XUL + XPCOM + SocketBridge, video capture using web tech and, in general, using web technologies for desktop applications.

Relative paths in Desktop Entry files

I was playing around a bit with Desktop Entry files which provide a nice facade for hiding the execution details of a desktop application. However, a somewhat odd limitation is that relative paths are not supported. At least for the Exec key, I found a nice solution which makes use of bash and the %k field code allowed for the Exec value.

[Desktop Entry] Version=1.0 Name=Run Comment=Runner Test Application Exec=bash -c 'cd $(dirname %k) && ./runner-linux-x86_64/dist/bin/run --app application.ini' Path= Icon=/usr/share/icons/hicolor/scalable/status/application-running.svg Terminal=false Type=Application Categories=Utility;Application;Development;

The code above presumes that the application to run is at runner-linux-x86_64/dist/bin/run, relative to the location of the .desktop file.

Timeout your XHR requests

Client-side timeouts on XHR requests isn’t something I’ve ever thought a whole lot about. The default is no timeout and in most cases, where you’re kicking off an XHR request in response to a user interaction, you probably won’t ever notice an issue. That said, I ran into a case with ScratchGraph on Chrome where not having a timeout specified, along with some client-side network errors, left the application in a state where it was unable to send any more XHR requests.

ScratchGraph continuously polls its server for new data and every so often I would notice that the XHR calls would stop, with the application left in a broken state, unable to make any AJAX calls. This typically (but not always) occurred when the machine woke up from being put to sleep and in the console there would be a few error messages, typically a number of ERR_NETWORK_IO_SUSPENDED and ERR_INTERNET_DISCONNECTED errors. Testing within my development environment, it was impossible to reproduce. Finally, I came across this StackOverflow post that pointed out that not having a timeout specified on the XHR calls would result in these errors.

I’m still not exactly sure of the interplay between Chrome, the XHR requests, and the network state that results in this situation, but since adding a timeout, I’ve yet to notice this behavior again. It’s also worth noting that it’s very simple to add a timeout on an XHR request:

var xhr = new XMLHttpRequest();
xhr.open(
'GET', '/hello', true);
xhr.timeout = 500;
// time in milliseconds

Logging to stdout and a file

A simple way to log to both stdout and a file (using a pipe and tee):

./myapp 2>&1 | tee -a myapp.log

PostgreSQL database import with Ansible

I had a hard time pulling together all the steps needed to import a PostgreSQL database using Ansible. Here’s the Ansible YAML blocks used to import the seed database for Lexiio.

1. Install PostgreSQL - name: Install Postgres
apt: name={{ item }} update_cache=yes cache_valid_time=3600 state=present
sudo: yes
with_items:
- postgresql
- postgresql-contrib
- libpq-dev
- python-psycopg2
tags: packages

2. Create the database (lexiiodb), UTF-8 for encoding and collation

- name: Create lexiiodb database
sudo_user: postgres
postgresql_db: name=lexiiodb encoding='UTF-8' lc_collate='en_US.UTF-8' lc_ctype='en_US.UTF-8' state=present

3. Create a role that will be granted access to the database (password is a variable read from some secret source)

- name: Create lexiio role for database
sudo_user: postgres
postgresql_user: db=lexiiodb user=lexiio password="{{ password }}" priv=ALL state=present

4. Start the PostgreSQL service

- name: Start the Postgresql service
sudo: yes
service:
name: postgresql
state: started
enabled: true

5. Import data into the database (using psql to pull in data from /home/lexiiodb.dump.sql)

- name: Importing lexiiodb data
sudo_user: postgres
shell: psql lexiiodb < /home/lexiiodb.dump.sql

6. For the role created, grant permissions on all schemas in the DB

- name: Grant usage of schema to lexiio role
sudo_user: postgres
postgresql_privs: database=lexiiodb state=present privs=USAGE type=schema roles=lexiio objs=dictionary

7. For the role created, grant permissions on all tables in the DB

- name: Grant table permissions for lexiio role
sudo_user: postgres
postgresql_privs: database=lexiiodb schema=dictionary state=present privs=SELECT,INSERT,UPDATE type=table roles=lexiio grant_option=no objs=ALL_IN_SCHEMA

8. For the role created, grant permissions on all sequences in the DB

- name: Grant sequence permissions for lexiio role
sudo_user: postgres
postgresql_privs: database=lexiiodb schema=dictionary state=present privs=USAGE type=sequence roles=lexiio grant_option=no objs=ALL_IN_SCHEMA

A more relational dictionary

As I started looking to add more functionality to Lexiio, I realized the Wiktionary definitions database dump I was using wasn’t going to cut it; specifically, I needed a normalized schema, or I’d have data duplication all over the place. I started normalizing in MySQL, but whether it was MySQL or MySQL Workbench, I kept running into character encoding issues. Using a simple INSERT-SELECT, in MySQL 5.7, to transfer words from the existing table to a new table resulted losing characters:

MySQL losing characters

I dumped the data into PostgreSQL, didn’t encounter the issue, and just kept working from there.

The normalized schema can be downloaded here: LexiioDB normalized
(released under the Creative Commons Attribution-ShareAlike License)

LexiioDB schema

The unknown_words and unknown_to_similar_words tables is specific to Lexiio and serve as a place to store unknown words entered by the user and close/similar matches to known words (via the Levenshtein distance).