Posts Tagged ‘PHP’

Pushing computation to the front: client-side compression

Client → Server Compression

Content from a web server being automatically gzipped (via apache, nginx, etc.) and transferred to the browser isn’t anything new, but there’s really nothing in the way of compression when going in the other direction (i.e. transferring content from the client to the server). This is not too surprising, as most client payloads are small bits of textual content and/or binary content that is already well compressed (e.g. JPEG images), where there’s little gain from compression and you’re likely to just waste CPU cycles doing it. That said, when your frontend client is a space for content creation, you’re potentially going to run into cases where you’re sending a lot of uncompressed data to the server.

Use-case: ScratchGraph Export

ScratchGraph has an export feature that essentially renders the page (minus UI components) as a string of HTML. This string packaged along with some metadata and sent to the server, which sends it to a service running puppeteer, that renders the HTML string to either an image or a PDF. The overall process looks something like this:

ScratchGraph Export Flow

The HTML string being sent to the server is relatively large, a couple of MBs, due to:

  • The CSS styles (particularly due to external resources being pulled in and inlined as base64 URLs)
  • The user simply having lots of content

To be fair, it’s usually the former rather than the latter, and optimizing to avoid the inlining of resources (the intent of which was to try and do exports entirely in the browser) would have a greater impact in reducing the amount of data being transferred to the server. However, for the purposes of this blog post (and also because it leads to a more complex discussion on how the application architecture can/should evolve and what this feature looks like in the future), we’re going to sidestep that discussion and focus on what benefits data compression may offer.

Compression with pako

I was more than ready to implement a compression algorithm, but was happy to discover pako, which does zlib compression. Compressing (i.e. deflating) with pako is very simple, below I encode the HTML string to UTF8 via TextEncoder.encode() (this is because I want UTF8, this isn’t a requirement of pako), which returns a Uint8Array, then use that as the input for pako.deflate(), which also returns a Uint8Array.

const staticHtmlUtf8Arr = (new TextEncoder()).encode(html); const compressedStaticHtmlUtf8Arr = pako.deflate(staticHtmlUtf8Arr);

Here’s what that looks like in practice, exporting the diagram shown above:

ScratchGraph Export, with pako compression, results

That’s fairly significant, as the data size has been reduced by 1,237,266 bytes (42.77%)!

The final bit for the frontend is sending this to the server. I use a FormData object for the XHR call and, for the compressed data, I put append it as a Blob:

formData.append( "compressedStaticHtml", new Blob([compressedStaticHtmlUtf8Arr], {type: 'application/zlib'}), "compressedStaticHtml" );

Handling the compressed data server-side with PHP

PHP support zlib compression/decompression via the zlib module. The only additional logic needed server-side is calling gzuncompress() to decompressed the compressed data.

$staticHtml = gzuncompress(file_get_contents($compressedStaticHtmlFile->getFilePath()));

Note that $compressedStaticHtmlFile is an object representing a file pulled from the request (note that FormData will append a Blob in the same manner as a file, so server-side, you’re dealing with the data as a file). The File.getFilePath() method here is simply returning the path for the uploaded file.

Limitations

Compressing and decompressing data will cost CPU cycles and, for zlib and most algorithms, this will scale with the size of the data. So considerations around what the client-side system looks like and the size of the data need to be taken into account. In addition, compression within a browser’s main thread can lead to UI events, reflow, and repaint being blocked (i.e. the page becomes unresponsive). If the compression time is significant, performing it within a web worker instead would be a better path.

Performance visibility with HTTP Server-Timing

Visibility into the performance of backend components can be invaluable when it comes to spotting and understanding service degradation, debugging failures, and knowing if and where optimization is needed. There’s a host of collection agents, aggregators, and visualization tools to handle metrics, but just breaking down and looking at what happens during an HTTP request can offer a lot of insight into how components are performing. This is why I’m pretty excited about the the HTTP Server-Timing header, it works well as a lightweight mechanism to surface performance metrics, especially now that it’s read and graphed by Chrome Devtools (and, perhaps sometime soon, by Firefox Devtools as well).

An HTTP response with the Server-Timing header

The following code snippet shows an Illuminate/Http/Response from a controller that PUTs an image into an Amazon S3 bucket.

return response()
    ->json(
        [],
        StatusCode::STATUS_OK,
        [
            'Server-Timing' => 's3-io;desc="Image upload to S3";dur=' . calculateTimeToPut(),
        ]
    );

Let’s assume the calculateTimeToPut() function returns 5500 (i.e. 5500 milliseconds to PUT the image onto S3), and the response header looks something like this:

HTTP Server-Timing header parts

Each metric is a group composed of 3 pieces, with each piece delimited by a semicolon:

  • Metric Name (required)
  • Metric Description
  • Metric Value

Multiple metrics can be surfaced by separating each group with a comma.

return response()
    ->json(
        [],
        StatusCode::STATUS_OK,
        [
            'Server-Timing' => 
                's3-io;desc="Image upload to S3";dur=' . calculateTimeToPut() . 
                ',' . 
                'db-io;desc="DB update of entity";dur=' . calculateTimeToUpdate()
        ]
    );

(The above code is a bit simplistic, you’d likely want to better way to store and group metrics, then do a final transformation to construct the Server-Timing string when it’s time to send the HTTP response)

Surfacing in DevTools

Surfacing metrics in an HTTP response is not something terribly complex and I’m sure most could devise other ways to do it, but one reason Server-Timing is a bit more attractive vs a custom solution is the out-of-the-box support within Chrome DevTools.

HTTP Server-Timing in Chrome DevTools

Firefox Devtools will likely follow suit (hopefully?) in the near future.

The PerformanceServerTiming interface

Server-Timing metrics can also be surfaced via the PerformanceServerTiming interface, from MDN:

In addition to having Server-Timing header metrics appear in the developer tools of the browser, the PerformanceServerTiming interface enables tools to automatically collect and process metrics from JavaScript.

This opens up some interesting possibilities as it enables collecting metrics via a frontend script (as is already done for a lot of product metrics via services like Google Analytics), rather than a backend collector mechanism. While not ground-breaking, the standardization around PerformanceServerTiming may allow for greater adoption and acceptance of this collection pattern.

GeoNames geographical database

I came across the GeoNames database recently and was impressed with the breadth of locations available. I downloaded the allCountries.zip from http://download.geonames.org/export/dump/ which gives data (name, location, population, etc.) on places across all countries in one, TSV delimited, text file. To work with the data more easily, I wrote a PHP script to put the entries into a MySQL database table (it’s actually just a simple modification to the script I used for the Wiktionary definitions import). The TSV, MySQL database, and PHP script are all presented below.

GeoNames allCountries.zip

GeoNames MySQL database export

<?php

require "Database.php";

$tsvInputFilePath = "allCountries.txt";

echo "Importing {$tsvInputFilePath} ...\n";

// Open file
$fp = fopen($tsvInputFilePath, "r");
if($fp === FALSE) {
echo "Could not find file path: " . $tsvInputFilePath;
exit;
}

// Establish DB connection
$db = new Database();

while (!feof($fp)) {

// Get line and parse tab-delimited fields
$ln = fgets($fp);
$parts = explode("\t", $ln);

if(count($parts) < 19) {
continue;
}

// Insert into database
$db->query("INSERT INTO cities (`id`,
`name`,
`asciiname`,
`alternatenames`,
`latitude`,
`longitude`,
`feature_class`,
`feature_code`,
`country_code`,
`cc2`,
`admin1_code`,
`admin2_code`,
`admin3_code`,
`admin4_code`,
`population`,
`elevation`,
`dem`,
`timezone`,
`last_modified_at`)
VALUES (?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?)"
,

$parts[0],
$parts[1],
$parts[2],
$parts[3],
$parts[4],
$parts[5],
$parts[6],
$parts[7],
$parts[8],
$parts[9],
$parts[10],
$parts[11],
$parts[12],
$parts[13],
$parts[14],
$parts[15],
$parts[16],
$parts[17],
$parts[18]

);


}

echo "done.\n";
exit;

The Database class is wrapper for mysqli, you can find it, along with the script above, in the geonames-allcountries-import bitbucket repo.

Note that this script will take a while to run (likely a few days) as there are 9,195,153 records that need to be inserted and we’re just doing simple INSERTs with no optimizations.

An overview of each of the fields in the database can be found in the GeoNames export readme.txt. Particularly important is the feature_class and feature_code fields, the range of values for which can be found on the GeoNames Feature Codes page. Also, as indicated in the readme, the data is licensed under the Creative Commons Attribution 3.0 License.

Round to midnight

A problem I’ve run into a few times is taking the current unix timestamp and rounding it to midnight, so that I can get the unix time for the start of the day. In PHP, I’ve commonly done the following:

$timestamp = strtotime('today midnight');

It’s one of the solutions presented in this StackOverflow post.

The solution above works fine, but I began thinking about how to actually do the computation and bypass the string parsing done by strtotime(). The computation is actually pretty simple, as it’s in the same vein as snapping a point to a grid. The verbose code snippet below shows the step-by-step process in the computation.

// Given the number of seconds in a day
$numSecondsInDay = 86400;

// .. and the current unix time
$currentTime = time();

// We can compute the number of days since the unix epoch (the decimal/fractional part is the portion of the current day that's elapsed)
$daysSinceEpoch = $currentTime / $numSecondsInDay;

// We can throw away the fractional part by rounding down with the floor() function
$wholeDaysSinceEpoch = floor($daysSinceEpoch);

// The number of whole days since the epoch x the number of seconds in a day will give the time for the current day at midnight
$midnightToday = $wholeDaysSinceEpoch * $numSecondsInDay;

One interesting thing to notice: if you replace the floor() function with the ceil() function, rounding up the number of days since the epoch, you’ll get the start of the next day – midnight tomorrow.

Wiktionary definitions database

Having a dictionary can be incredibly useful in software development, and forms the basis for a wide range of natural language processing applications. However, finding an open-source dictionary, one that can be easily parsed and used within applications, is incredibly difficult as there simply isn’t a lot of options available.

WordNet is one option I came across, but requires significant work parsing the WordNet ASCII database files or Prolog database files.

Wiktionary was the other viable option, and the one I went with. The Wiktionary XML dumps are available, but being a wiki, these files are likely even more difficult to parse than the WordNet database files as you’d have to deal with wiki markup. However, a while ago I was able to get a TSV file with words, parts of speech, and definitions from the Wikimedia Toolserver at http://toolserver.org/~enwikt/definitions. The Toolserver has since been discontinued and I haven’t found updated TSVs hosted anywhere else, but the file I downloaded, dated November 27, 2012, is still fairly up-to-date for a dictionary and useful in many applications.

I wrote a PHP script to parse the TSV and make INSERTs into a MySQL database. The TSV file, MySQL database, and PHP script are presented below.

Wiktionary TSV file

Wiktionary MySQL database export

PHP Script:

<?php

require "Database.php";

$tsvInputFilePath = "TEMP-E20121127.tsv";

echo "Importing {$tsvInputFilePath} ...\n";

// Open file
$fp = fopen($tsvInputFilePath, "r");
if($fp === FALSE) {
echo "Could not find file path: " . $tsvInputFilePath;
exit;
}

// Establish DB connection
$db = new Database();

while (!feof($fp)) {

// Get line and parse tab-delimited fields
$ln = fgets($fp);
$parts = explode("\t", $ln);
if(count($parts) < 4) {
continue;
}

$lang = $parts[0];
$word = $parts[1];
$partOfSpeech = $parts[2];
$definitionRaw = $parts[3];

// Insert into database
$db->query("INSERT INTO words (language, word, part_of_speech, definition_raw)
VALUES (?, ?, ?, ?)"
,
$lang, $word, $partOfSpeech, $definitionRaw);

}

echo "done.\n";
exit;

The Database class is wrapper for mysqli, you can find it, along with the script above, in the wiktionary-tsv-import bitbucket repo.

Note that definitions need to be parsed further, as they contain wiki markup. The parsing doesn’t seem difficult and is something I hope to get done in the near future.

Related resources:

There’s valuable stuff from each of the projects above, but like WordNet, requires significantly more time to evaluate and implement in an application, compared to the simple TSV -> MySQL translation.

EDIT (12/13/2015): I’ve updated the MySQL database export. There was some holes in the data because I was using utf8 column encoding for definitions, however, MySQL’s has a weird “UTF-8” implementation that only handles codepoint that up to 3 bytes in size. utf8mb4 encoding needs to be used for a proper UTF-8 encoding supporting up to 4 bytes.

PHP count() is O(1)

I was curious about the performance of PHP’s count() function a while back and whether it was worth it to store the result in a variable for repeated use. I discovered the following from this answer by FractalizeR on Stack Overflow:

PHP_FUNCTION(count) calls php_count_recursive(), which in turn calls zend_hash_num_elements() for non-recursive array, which is implemented this way:

ZEND_API int zend_hash_num_elements(const HashTable *ht)
{
IS_CONSISTENT(ht);

return ht->nNumOfElements;
}

So you can see, it’s O(1) for $mode = COUNT_NORMAL.

IMAP Pickup

An interesting little project I wanted to work on; I wanted to be able to pull attachments from emails in an IMAP mailbox and then download them. I wanted an IMAP solution instead of writing a script for the MTA as a script would be specific to the MTA software and not transferable to another server. In addition, there’s also the common case where you may simply not have access to the MTA.

The biggest help in putting this together and dealing with attachments was this blog post and this comment on the PHP docs. Information on doing this is a bit scattered and incomplete in many cases, likely because extracting attachments is somewhat difficult as email is a notoriously bad way to transfer files; the file data is base64 encoded and dumped in as part of the message body.

ImapPickup is the class that encapsulates all the necessary functionality,

class ImapPickup
{
protected $imapStream = null;

protected function findAttachments($part)
{
$partNum = -1;
$attachments = array();

$this->findAttachmentsRec($part, &$attachments, &$partNum, -1);

return $attachments;
}

protected function findAttachmentsRec($part, &$attachments, &$partNum, $partNumSub)
{
if (isset($part->parts))
{
foreach ($part->parts as $partOfPart)
{
$this->findAttachmentsRec($partOfPart, &$attachments, &$partNum, $partNumSub+1);
}
}
else
{

if (isset($part->disposition)){
if ($part->disposition == 'attachment') {
$attachments[] = array($part->dparameters[0]->value, $partNum, $partNumSub);
}
}
}

$partNum++;
}

public function getAttachmentContent($msgNum, $partNum)
{
$contents = imap_fetchbody($this->imapStream, $msgNum, $partNum, FT_UID);
return imap_base64($contents);
}

public function getAttachments($msgNum)
{
$struct = imap_fetchstructure($this->imapStream,$msgNum,FT_UID);
$attachments = $this->findAttachments($struct);

return $attachments;
}

public function getAttachmentsFromMessages($msgArray)
{
$msgIdToAttachmentsMap = array();

if ($msgArray)
{
foreach($msgArray as $msgId)
{
$attachments = $this->getAttachments($msgId);
if(!empty($attachments))
{
$msgIdToAttachmentsMap[$msgId] = $attachments;
}
}
}

return $msgIdToAttachmentsMap;
}

public function getMessages($searchQuery)
{
return imap_search($this->imapStream, $searchQuery, SE_UID);
}

public function connect($mailbox, $user, $password)
{
$this->imapStream = imap_open($mailbox, $user, $password);
}

public function disconnect()
{
imap_close(
$this->imapStream);
}

}

Here’s a little example of how it can be used. This will query all messages with “pickup::” in the subject line and print out the messageID of all messages with attachments, followed by the filenames of all attachments for that message.

$imapPickup = new ImapPickup();
$imapPickup->connect("{mail.hotspotdot.net:143}INBOX", "test@test.net", "pass123");

$messages = $imapPickup->getMessages("SUBJECT pickup::");
$attachments = $imapPickup->getAttachmentsFromMessages($messages);

foreach($attachments as $msgId => $attArr)
{
echo "<p>{$msgId} => ";

foreach($attArr as $attachment)
{
echo $attachment[0];
echo ",";
}

echo "</p>";
}

$imapPickup->disconnect();

The array for a single file attachment contains 3 entries:

  • [0] => filename
  • [1] => major part number
  • [2] => minor part number

getAttachments(), findAttachments(), and findAttachmentsRec() will return an array of such entries (or an empty array is there are no attachments). getAttachmentsFromMessages() will return a map from messageID => array of single attachments.

The part number (both major and minor) is needed to retrieve the contents of an attachment. For getAttachmentContent(), simply use the major number if the minor number is <= 0, or concatenate them with a period separating them (e.g. "2.3").

PostgreSQL + PHP installation on Windows 2003 x64

Well the PostgreSQL installation itself is easy enough, getting it to work with PHP is the challenging part. Here’s what I did:

Reflex Feedback widget

I worked on a small AJAX widget for user feedback built atop jQuery UI: Reflex Feedback. It’s inspired by the widgets you see from services like Get Satisfaction and UserVoice, but much simpler and it’s a frontend-only widget, how you handle the feedback info on the backend is up to you.

Here’s what it looks like.

reflex feedback widget dialog

And here’s what the tag that opens the dialog looks like:

reflex feedback widget tag

To use it, download or clone the ReflexFeedback repo from bitbucket

Place the .js file wherever you’d like but the /reflex.content folder should a subdirectory in the same folder as the page loading the .js file. Load reflex.js as you would any other javascript file:

<script type="text/javascript" src="js/reflex.js"></script>

Call Reflex.init() to add the widget to the page. The first argument is the DOM element to attach the additional HTML/CSS code to. The seconds argument is the server-side script to call when the user clicks Send Feedback.

Reflex.init($('body'), 'controller/post_feedback.php');

That’s it for the frontend. You should see the tag show up in the right-hand corner and when clicked the dialog open.

For the backend, the AJAX call to send the feedback info will send a POST request with 2 fields: feedback_type, feedback_txt.

Reflex expects an XML reply from the server:

<reflex>
<result>ok</result>
</reflex>

ok indicates a successful result, any other reply is considered an error.

A successful result will close the dialog and show another with a thank you message.

reflex feedback thank you dialog

For an error, a message is shown below the Send Feedback button, informing the user that an error has occurred and to try again.

reflex feedback send fail

As for what to actually do with the feedback, that’s up to you, but what I’m doing is sending myself an email with the feedback info. I’ve posted my PHP script below; feel free to use it, modify it, etc. If you do use this code, be sure to fill in your mail server credentials and a from address; you’ll also need PEAR’s Mail package installed.

<?php

require_once "Mail.php";
require_once "Mail/mime.php";

header('Content-type: application/xml; charset=utf-8');
echo "<?xml version=\"1.0\" encoding=\"utf-8\"?>\r\n";

if(!isset($_POST['feedback_type']) || !isset($_POST['feedback_txt']))
{
echo "<reflex><result>error:missing-arguments</result></reflex>";
}
else
{
$from = "...";
$to = "...";
$subject = "Feedback from user...";

$feedback_type = $_POST['feedback_type'];
$feedback_txt = $_POST['feedback_txt'];

$bodyHtml = "<html><body>";
$bodyHtml .= "<p>Type: {$feedback_type}</p>";
$bodyHtml .= "<p>Feedback: {$feedback_txt}</p>";
$bodyHtml .= "</body></html>";
$body = $bodyHtml;

$host = "...";
$port = "...";
$username = "...";
$password = "...";

$headers = array('MIME-Version' => '1.0rn',
'Content-type' => 'text/html; charset=utf-8',
'From' => $from, 'To' => $to, 'Subject' => $subject);


$smtp = Mail::factory('smtp',
array ('host' => $host,
'port' => $port,
'auth' => true,
'username' => $username,
'password' => $password));

$mail = $smtp->send($to, $headers, $body);

if (PEAR::isError($mail))
{
$err_details = $mail->getMessage();
echo "<reflex><result>error:send-failure</result><details>{$err_details}</details></reflex>";
}
else
{
echo "<reflex><result>ok</result></reflex>";
}
}

?>

That’s all for now. I’ll work on more features and options for customization in the future. You can see the widget in action over at dotspott.com

hotspotdot

hotspotdot is my entry into Microsoft’s My App is Better Challenge. In a nutshell, hotspotdot allows you to locate and tag “hot spots” that are important to you, allowing you to create a personal database of hot spots.

It’s a very simple app, utilizing PHP, SQL Server 2008, jQuery, jQuery UI, and the Google Maps API. I wrapped thing up in under 2 weeks, but in order to make the August 25th deadline there were a few features I didn’t get a chance to put in, such as preloaders, icons on the map, and searching by tag. I’ll probably devote some time to these in the coming weeks.

hotspotdot login

The goal of the contest was to create something showing off the power of SQL Server and Microsoft’s new PHP for SQL Server 2.0 drivers. I used the new PDO driver which was pretty sweet – very simple, elegant API. Two things in particular that impressed me were transactions (very nice for doing multiple inserts or deletions) and a consistent and well designed exception model (makes error handling much easier and especially powerful when combined with transactions as failed queries don’t effect the database [code jumps to exception handler before commit() is called], so no chance of junk being inserted). Error messages, in general, were also much more descriptive than those I’ve encountered with MySQL.

Finding a server proved difficult. I got a shared hosting solution, but the host was unable (though I suspect unwilling) to install the PHP for SQL drivers. So I ended up getting a virtual private server, but this only came with SQL Server Express and I used some features (see below) that prevented a migration. In order to avoid purchasing an SQL Server 2008 license (way out of my budget), my final solution was use the shared hosting server for the DB and the VPS for everything else. Hence the reason for the site being located at the my. subdomain, which maps to the IP of the VPS.

SQL Server 2008 is a fine system (despite some annoyances with the management studio)… though it’s really just a solid database system, which isn’t bad, but there’s nothing really impressive or creative about that. It’s not really leaps and bounds above a cheap solution like MySQL. The contest was about SQL Server, so I tried to do something that utilized a fairly unique aspect of the system: the geography data type. I wasn’t too impressed. It simply holds a (longitude, latitude) pair in a certain format, nothing more. The one big advantage of having a vector type like this would be doing comparisons based on distance but, as far as I could tell, this isn’t supported (my queries failed). Worse yet, in SQL Server Management Studio the display of the geography type is in hexadecimal, making things very cryptic… I couldn’t help but wonder why this is any better than using 2 columns and storing the longitude and latitude values independently. That said, the idea of richer/more-complex data types within a relational database is a pretty cool idea and it would interesting to see it taken further and beyond its current, primitive state.

hotspotdot map

hotspotdot map

Leave a comment if you find a bug or have any questions, comments, etc.