semi/signal

Where we live

Avishkar Autar · Oct 13 2014 · Graphics and Rendering

Using a number of technologies I’ve been playing around with recently, I began working on a 3D visualization of the Earth, plotting every city, creating a pointillism-styled representation of the planet. Below is the result along with an overview of how I produced the rendering.

Getting the data

I extracted all cities with a population of at least 100,000 people from the MySQL GeoNames database using the following query:

SELECT `id`,`name`,`latitude`,`longitude`,`population`,`timezone`
FROM geonames.cities
WHERE population >= 100000 AND feature_class = 'P';

… and put the results into a JS array.

Creating a 3D model to represent each city

I created this hexagonal model in Blender, exported it to a Wavefront OBJ file, and ran the OBJ file through the Wavefront OBJ to JSON converter I wrote. Note that the model is facing the z-axis to match WebGL’s (and OpenGL’s) default camera orientation: facing down the negative z-axis.

Convert longitude and latitude to a 3D position

Converting a geodetic longitude, latitude pair to a 3D position involves doing a LLA (Longitude Latitude Altitude) to ECEF (Earth-Centered, Earth-Fixed) transformation. The code below implements this transform, converting the longitude and latitude of every city pulled from the GeoNames database into a 3D coordinate where we can render the hexagonal representation of the city.

function llarToWorld(lat, lon, alt, rad) 
{            
    lat = lat * (Math.PI/180.0);
    lon = lon * (Math.PI/180.0);

    var f  = 0;                                                     //flattening
    var ls = Math.atan( Math.pow((1.0 - f),2) * Math.tan(lat) );    // lambda

    var x = rad * Math.cos(ls) * Math.cos(lon) + alt * Math.cos(lat) * Math.cos(lon)
    var y = rad * Math.cos(ls) * Math.sin(lon) + alt * Math.cos(lat) * Math.sin(lon)
    var z = rad * Math.sin(ls) + alt * Math.sin(lat)
    
    return [x,z,-y];            
}

There are 2 items worth noting:

The transformation (and function above) involve a 4th parameter, radius which is the radius of the ellipsoid (or sphere, in this case, as flattening=0) into which the transformation is done. I have it set as a fixed constant, as I’m primary concerned with an approximate visual representation, but the MathWorks page describes the actual computation.
The ECEF (Earth-Centered, Earth-Fixed) coordinate system has the z-axis pointing north, not the y-axis, so the z and y values need to be swapped to produce a coordinate corresponding to WebGL’s default camera orientation. In addition, as WebGL has a right-handed coordinate system (so the default camera orientation is one where it’s pointing down the negative z-axis), the z coordinate is negated so the point doesn’t wind up behind the camera.

Orient all cities to face the origin

Getting each of the hexagonal models to face the origin involved a bit of math:

Calculating the axis about which the rotation should occur by, first, computing a vector from the origin to the 3D position of the model (lookAt), and taking the cross product between lookAt and the z-axis (as we’re rotating toward the z-axis).
Calculating the angle of rotation (the angle between the z-axis and lookAt) by computing the dot product between lookAt and the z-axis, then taking the acos of the dot product.

There’s some additional code to handle cases where points lie on the on the z-axis (where the cross product gives the zero vector) and also to return a matrix representation of the rotation.

 function lookAtOrigin(v)
 {
     // compute vector from origin
     var lookAt = vec3.create([v[0], v[1], -v[2]]);
     vec3.normalize(lookAt);
     
     // reference axis
     var refAxis = vec3.create([0,0,-1]);
     
     // computate axis of rotation
     var rotAxis = vec3.create(lookAt);
     vec3.cross(rotAxis, refAxis);
     
     // compute angle of rotation
     var rotAngRad = Math.acos(vec3.dot(lookAt, refAxis));                

     // special cases...
     if(rotAxis[0] == 0 && rotAxis[1] == 0 && rotAxis[2] == 0) { 
         if(lookAt[2] > 0) {
             rotAxis = vec3.create([1,0,0]);
             rotAngRad = Math.PI; 
         } else {
             rotAxis = vec3.create([1,0,0]);
             rotAngRad = 0;                         
         }
     }
                        
     // compute and return a matrix with the rotation
     var ret = mat4.identity();                                
     mat4.rotate(ret, rotAngRad, rotAxis);                
                     
     return ret; 
}

Render the scene

Using glfx, I pulled everything together, also adding a bit of code to rotate the camera and do some pseudo-lighting in the pixel shader by alpha blending colors based on depth. All the code can be found in the webgl-globe repository on bitbucket.

computer graphicsdata visualizationEarth-Centered-Earth-FixedgeonamesjavascriptLLA-to-ECEFwebgl

Double-tap interactions

Avishkar Autar · Sep 20 2014 · Web Technologies

While click events are seamlessly supported on touch devices, double-click events are not and there is no equivalent double-tap event available. However, double-tap interactions can be captured by listening for multiple, subsequent touchend events. The code below shows how to do this, with the basis for a double-tap being:

Two touchend events, on a certain element, both occurring within a certain time interval (300ms)
Two touchend events, on a certain element, both occurring within a certain distance from each other (24px)

I’ve come across a lot of code that addresses the first point (see double tap on mobile safari), but the second point is just as important because almost all touchscreens are multitouch and a two-handed posture with a phone or tablet is not uncommon. With a two-handed posture, and an element large enough to span the screen, it’s easy to hit two different areas of the element (on opposite ends of the screen) in rapid succession – a double-tap would be detected, even though two distant points on the element were touched, unless the distance between the points is taken into account.

Ideally, both the interval and distance should be user-defined settings at the device/operating-system level, similar to the way setting the double-click speed is, but lacking such support, hacking it in at the application-level is the only viable option.

            
// Handler for when a double-tap (on a touchscreen) or double-click (with a mouse) is detected
var dblTapHandler = function (x, y) {

    // Do stuff...

}
            
$(document).ready(function() {
        

    // Listen for double-click events for desktop/mouse-based interactions
    $('.ia-dbltap-area').on('dblclick', function (e) {          
        // Call handler          
        dblTapHandler(e.pageX, e.pageY);                    
    });
        

    // Listen for touchend events for touch-based interactions
    $('.ia-dbltap-area').on('touchend', function(e) {       

        var dblTapRadius = 24; // radius (in pixels) of the area in which we expect the 2 taps for a double-tap
        var dblTapSpeed = 300; // interval (in milliseconds) in which we expect the 2 taps for a double-tap

        if(e.originalEvent.changedTouches.length <= 0) {
            return false; // we have nothing to work with
        }                    

        var dblTapDetected = false;  // flag specifying if we detected a double-tap
        var areaElem = $(this); // element in which this touchend event has occured
                    
        // Position of the touch
        var x = e.originalEvent.changedTouches[0].pageX;
        var y = e.originalEvent.changedTouches[0].pageY;

        var now = new Date().getTime();
        
        // Check if we have stored data for a previous touch (indicating we should test for a double-tap)
        if(areaElem.data('last-touch-time')) {

            lastTouchTime = areaElem.data('last-touch-time');

            // Compute time since the previous touch
            var timeSinceLastTouch = now - lastTouchTime;  
                        
            // Get the position of the last touch on the element
            var lastX = areaElem.data('last-touch-x');
            var lastY = areaElem.data('last-touch-y');    
                        
            // Compute the distance from the last touch on the element
            var distFromLastTouch = Math.sqrt( Math.pow(x-lastX,2) + Math.pow(y-lastY,2) );

            // Check if:
            //      1. If the time since the last touch is within the specified double-tap interval (dblTapSpeed)
            //      2. The distance from the last touch is within the specified double-tap radius (tapRadius)
            if(timeSinceLastTouch <= dblTapSpeed && distFromLastTouch <= dblTapRadius) {

                // Flag that we detected a double tap
                dblTapDetected = true;
                            
                // Call handler
                dblTapHandler(x, y);

                // Remove last touch info from element
                areaElem.data('last-touch-time', '');
                areaElem.data('last-touch-x', '');     
                areaElem.data('last-touch-y', '');   
            }
                        
        }


        if(!dblTapDetected) { // A double-tap wasn't detected

            // Store time and position of this touch on the element
            // (Next touch may be a double-tap, we can use this info to determine if it is)
            areaElem.data('last-touch-time', now);                    
            areaElem.data('last-touch-x', x);     
            areaElem.data('last-touch-y', y);     
        }

    });
        
});

The demo below shows the code in actions along with a bit of SVG to render where the user double-clicked or double-touched div.ia-dbltap-area:

double-tapjavascriptmultitouchtouchendtouchscreen

GeoNames geographical database

Avishkar Autar · Aug 30 2014 · Random

I came across the GeoNames database recently and was impressed with the breadth of locations available. I downloaded the allCountries.zip from http://download.geonames.org/export/dump/ which gives data (name, location, population, etc.) on places across all countries in one, TSV delimited, text file. To work with the data more easily, I wrote a PHP script to put the entries into a MySQL database table (it’s actually just a simple modification to the script I used for the Wiktionary definitions import). The TSV, MySQL database, and PHP script are all presented below.

GeoNames allCountries.zip

GeoNames MySQL database export

<?php 

require "Database.php";

$tsvInputFilePath = "allCountries.txt";

echo "Importing {$tsvInputFilePath} ...\n";

// Open file
$fp = fopen($tsvInputFilePath, "r");
if($fp === FALSE) {
    echo "Could not find file path: " . $tsvInputFilePath;
    exit;
}

// Establish DB connection
$db = new Database();

while (!feof($fp)) {
    
    // Get line and parse tab-delimited fields
    $ln = fgets($fp);
    $parts = explode("\t", $ln);
        
    if(count($parts) < 19) {
        continue;
    }
       
    // Insert into database
    $db->query("INSERT INTO cities (`id`,
                                    `name`,
                                    `asciiname`,
                                    `alternatenames`,
                                    `latitude`,
                                    `longitude`,
                                    `feature_class`,
                                    `feature_code`,
                                    `country_code`,
                                    `cc2`,
                                    `admin1_code`,
                                    `admin2_code`,
                                    `admin3_code`,
                                    `admin4_code`,
                                    `population`,
                                    `elevation`,
                                    `dem`,
                                    `timezone`,
                                    `last_modified_at`) 
                VALUES (?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?)", 
            
                $parts[0],
                $parts[1],
                $parts[2],
                $parts[3],
                $parts[4],
                $parts[5],
                $parts[6],
                $parts[7],
                $parts[8],
                $parts[9],            
                $parts[10],
                $parts[11],
                $parts[12],
                $parts[13],
                $parts[14],
                $parts[15],
                $parts[16],
                $parts[17],
                $parts[18]
            
                );

       
}

echo "done.\n";
exit;

The Database class is wrapper for mysqli, you can find it, along with the script above, in the geonames-allcountries-import bitbucket repo.

Note that this script will take a while to run (likely a few days) as there are 9,195,153 records that need to be inserted and we’re just doing simple INSERTs with no optimizations.

An overview of each of the fields in the database can be found in the GeoNames export readme.txt. Particularly important is the feature_class and feature_code fields, the range of values for which can be found on the GeoNames Feature Codes page. Also, as indicated in the readme, the data is licensed under the Creative Commons Attribution 3.0 License.

citiesdatabasegazettergeographygeonameslocationsPHPplacestsv

Round to midnight

Avishkar Autar · Jul 26 2014 · Math

A problem I’ve run into a few times is taking the current unix timestamp and rounding it to midnight, so that I can get the unix time for the start of the day. In PHP, I’ve commonly done the following:

$timestamp = strtotime('today midnight');

It’s one of the solutions presented in this StackOverflow post.

The solution above works fine, but I began thinking about how to actually do the computation and bypass the string parsing done by strtotime(). The computation is actually pretty simple, as it’s in the same vein as snapping a point to a grid. The verbose code snippet below shows the step-by-step process in the computation.

// Given the number of seconds in a day
$numSecondsInDay = 86400;

// .. and the current unix time
$currentTime = time();

// We can compute the number of days since the unix epoch (the decimal/fractional part is the portion of the current day that's elapsed)
$daysSinceEpoch = $currentTime / $numSecondsInDay;

// We can throw away the fractional part by rounding down with the floor() function
$wholeDaysSinceEpoch = floor($daysSinceEpoch);

// The number of whole days since the epoch x the number of seconds in a day will give the time for the current day at midnight
$midnightToday = $wholeDaysSinceEpoch * $numSecondsInDay;

One interesting thing to notice: if you replace the floor() function with the ceil() function, rounding up the number of days since the epoch, you’ll get the start of the next day – midnight tomorrow.

PHPstrtotimeunix time

Wiktionary definitions database

Avishkar Autar · Jul 14 2014 · Random

Having a dictionary can be incredibly useful in software development, and forms the basis for a wide range of natural language processing applications. However, finding an open-source dictionary, one that can be easily parsed and used within applications, is incredibly difficult as there simply isn’t a lot of options available.

WordNet is one option I came across, but requires significant work parsing the WordNet ASCII database files or Prolog database files.

Wiktionary was the other viable option, and the one I went with. The Wiktionary XML dumps are available, but being a wiki, these files are likely even more difficult to parse than the WordNet database files as you’d have to deal with wiki markup. However, a while ago I was able to get a TSV file with words, parts of speech, and definitions from the Wikimedia Toolserver at http://toolserver.org/~enwikt/definitions. The Toolserver has since been discontinued and I haven’t found updated TSVs hosted anywhere else, but the file I downloaded, dated November 27, 2012, is still fairly up-to-date for a dictionary and useful in many applications.

I wrote a PHP script to parse the TSV and make INSERTs into a MySQL database. The TSV file, MySQL database, and PHP script are presented below.

Wiktionary TSV file

Wiktionary MySQL database export

PHP Script:

<?php 

require "Database.php";

$tsvInputFilePath = "TEMP-E20121127.tsv";

echo "Importing {$tsvInputFilePath} ...\n";

// Open file
$fp = fopen($tsvInputFilePath, "r");
if($fp === FALSE) {
    echo "Could not find file path: " . $tsvInputFilePath;
    exit;
}

// Establish DB connection
$db = new Database();

while (!feof($fp)) {
    
    // Get line and parse tab-delimited fields
    $ln = fgets($fp);
    $parts = explode("\t", $ln);
    if(count($parts) < 4) {
        continue;
    }
    
    $lang = $parts[0];
    $word = $parts[1];
    $partOfSpeech = $parts[2];    
    $definitionRaw = $parts[3];
    
    // Insert into database
    $db->query("INSERT INTO words (language, word, part_of_speech, definition_raw) 
                VALUES (?, ?, ?, ?)", 
                $lang, $word, $partOfSpeech, $definitionRaw);
       
}

echo "done.\n";
exit;

The Database class is wrapper for mysqli, you can find it, along with the script above, in the wiktionary-tsv-import bitbucket repo.

Note that definitions need to be parsed further, as they contain wiki markup. The parsing doesn’t seem difficult and is something I hope to get done in the near future.

Related resources:

Wikokit – parser to produce a machine-readable Wiktionary
DBpedia Wiktionary RDF extraction – RDF database and SPARQL querying interface of Wiktionary
perl-wiktionary-parser – PERL Wiktionary parser

There’s valuable stuff from each of the projects above, but like WordNet, requires significantly more time to evaluate and implement in an application, compared to the simple TSV -> MySQL translation.

EDIT (12/13/2015): I’ve updated the MySQL database export. There was some holes in the data because I was using utf8 column encoding for definitions, however, MySQL’s has a weird “UTF-8” implementation that only handles codepoint that up to 3 bytes in size. utf8mb4 encoding needs to be used for a proper UTF-8 encoding supporting up to 4 bytes.

databasedictionarynatural language processingPHPtsvwikiwiki markupwikokitwiktionarywordnet

Moving the caret to the end of text in an <input> element

Avishkar Autar · May 18 2014 · Web Technologies

Very simple, and the following will work in all modern browsers.

HTML

<input name="url" type="text" value="http://" />

Javascript

var inputElem = document.getElementsByName("url")[0];
                
var valLen = inputElem.value.length;

inputElem.selectionStart = valLen;
inputElem.selectionEnd = valLen;   

inputElem.focus();

The same technique will work for <textarea> elements as well.

HTMLjavascriptjavascript selectionmoving caret

Identifying the operating system with XPCOM

Avishkar Autar · May 8 2014 · Random

The following shows how to get a string identifying the current operating system from an instance of nsIXULRuntime:

var getOS = function() {        
    var env = Components.classes["@mozilla.org/xre/app-info;1"].getService(Components.interfaces.nsIXULRuntime);
    return env.OS;        
}

The nsIXULRuntime.OS string is one of the OS_TARGET values.

Ideally, I’d prefer XUL and XPCOM code to remain platform-agnostic, but I’ve used OS detection as a cheap way (versus jumping through 3 objects) to determine what path separator to use when referencing files and directories (backslash for “WINNT”, forward-slash for everything else). XPCOM is sensitive to the path separator; on Windows, it will not reference a file or directory if you use the forward slash. This is actually bizarre because Win32 API functions will accept paths with the forward slash as a separator. Even more bizarre is that we have a layer of abstraction that actually makes it harder to write platform-independent code.

directory separatornsIXULRuntimeoperating systemOS_TARGETxpcomxulxulrunner

Data driven

Avishkar Autar · May 1 2014 · Random

The Economist recently wrote a bit about how speech recognition got so good:

… words do not appear in random order, so the computer does not have to guess from (say) a vocabulary of 20,000 words for each word you speak. Instead, the software assesses how likely you are to have said a given word based on the surrounding words, drawing on statistical models derived from vast repositories of digitised documents and the previous utterances of other users.

This reminded me of a talk by Peter Norvig: The Unreasonable Effectiveness of Data, where he discusses utilizing such large repositories of data in order to develop effective algorithms for a number of problems; there is a heavy focus on natural language processing problems but the concept can, of course, be applied in other areas.

(If the name Peter Norvig sounds familiar, he’s the co-author of Artificial Intelligence: A Modern Approach which you might have used if you ever took an AI class.)

As a programmer, this is exciting stuff and certainly changed my thinking in regards to how I would approach similar problems in the future. Whereas before I would look at sample data sets and try to derive an algorithm, I’d now attempt to mine as much data as I could, build a statistical model, and use that as the basis of the algorithm. Of course mining a massive data set is sometimes easier said than done; especially in regards to data, much of the web is still a walled garden.

artificial intelligencebig datanatural language processingpeter norvigspeech recognition

Manipulating text relative to the caret in a contenteditable div

Avishkar Autar · Apr 20 2014 · Web Technologies

I wanted to play around a bit with dynamically modifying text as you type. The following is a simple auto-correct demo that makes use of the Selection and Range interfaces to replace text (read: text preceding the caret) within a contenteditable div.

$(document).on('keydown', '.ia-txt', function (e) {
            
    // check if space bar was hit
    if(e.keyCode == 32) {
                    
        // we'll check for the string "hwat"; incorrect form of "what"
        var incorrectTxt = "hwat";
    
        // Get selection and range based on position of caret
        // (we assume nothing is selected, and range points to the position of the caret)
        var sel = window.getSelection();  
        var range = sel.getRangeAt(0);   
                            
        // check that we have at least incorrectTxt.length characters in our container
        if(range.startOffset - incorrectTxt.length >= 0) {
        
            // clone the range, so we can alter the start and end
            var clone = range.cloneRange();                
            
            // alter start and end of cloned ranged, so it selects incorrectTxt.length characters
            clone.setStart(range.startContainer, range.startOffset - incorrectTxt.length); 
            clone.setEnd(range.startContainer, range.startOffset);        

            // get contents of cloned range
            var contents = clone.toString();                    
                                    
            // check if the contents of the cloned range is equal to our incorrectTxt string
            if(contents == incorrectTxt) {
                                        
                // delete the contents of the range ("hwat")
                clone.deleteContents();    
                
                // create a text node with the corrected text ("what") and insert it where we deleted the incorrect text
                var txtNode = document.createTextNode("what");
                range.insertNode(txtNode);
                
                // set the start of the range after the inserted node, so we have the caret after the inserted text
                range.setStartAfter(txtNode);
                                    
                // Chrome fix
                sel.removeAllRanges();
                sel.addRange(range);               

            }                
        }
    }
    

});

You can see the code in action in the frame below. Every time you press the space-bar and the string “hwat” is detected, preceding the position of the caret, it is removed and replaced with the string “what”:

This is an incredibly trivial example (note that it doesn’t even check that the string “hwat” is surrounded by whitespace on both sides), but it does serve as a template for more advanced functionality. That said, be very aware of minor differences in the behavior of Range methods when working across browsers, I’ve stumbled across a few:

The code above breaks under certain conditions in Internet Explorer. If you move the caret to a position between 2 words, type “hwat” + space (the string is auto-corrected to “what”), then type “hwat” + space again, the auto-correct doesn’t work. The range.startOffset variable seems incorrect (too small) and subtracting incorrectTxt.length (4) yields a negative start offset.
Using a keyup event instead of a keydown event, and checking for the string “hwat ” instead yields different behaviors in Firefox and Chrome. Firefox preserves the space after the corrected string, and the caret is at the position after the space. However, Chrome strips the space and the caret is after the corrected string.
After the selection’s range is altered after auto-correcting, Chrome requires the removeAllRanges(), addRange() calls to replace the selection’s range, but Firefox does not.

contenteditableHTMLjavascriptjavascript rangejavascript selection

Launching an application with XPCOM

Avishkar Autar · Mar 30 2014 · Random

Continuing to document my work with XULRunner, XUL, and XPCOM, here I’m presenting code on how to launch an executable using XPCOM’s nsILocalFile interface fetch the executable file and the nsIProcess interface to execute the process.

// target = path to executable
// args = arguments for executable
function exec(target, args) {

    try {

        var file = Components.classes["@mozilla.org/file/local;1"].createInstance(Components.interfaces.nsILocalFile);
        file.initWithPath(target);

        var process = Components.classes["@mozilla.org/process/util;1"].createInstance(Components.interfaces.nsIProcess);
        process.init(file);

        var args = [''];
        process.run(false, args, args.length);
        
        return process;
    }
    catch (err) {
        alert(err);
        return null;
    }

}