Posts Tagged ‘PHP’

GeoNames geographical database

I came across the GeoNames database recently and was impressed with the breadth of locations available. I downloaded the allCountries.zip from http://download.geonames.org/export/dump/ which gives data (name, location, population, etc.) on places across all countries in one, TSV delimited, text file. To work with the data more easily, I wrote a PHP script to put the entries into a MySQL database table (it’s actually just a simple modification to the script I used for the Wiktionary definitions import). The TSV, MySQL database, and PHP script are all presented below.

GeoNames allCountries.zip

GeoNames MySQL database export

<?php

require "Database.php";

$tsvInputFilePath = "allCountries.txt";

echo "Importing {$tsvInputFilePath} ...\n";

// Open file
$fp = fopen($tsvInputFilePath, "r");
if($fp === FALSE) {
echo "Could not find file path: " . $tsvInputFilePath;
exit;
}

// Establish DB connection
$db = new Database();

while (!feof($fp)) {

// Get line and parse tab-delimited fields
$ln = fgets($fp);
$parts = explode("\t", $ln);

if(count($parts) < 19) {
continue;
}

// Insert into database
$db->query("INSERT INTO cities (`id`,
`name`,
`asciiname`,
`alternatenames`,
`latitude`,
`longitude`,
`feature_class`,
`feature_code`,
`country_code`,
`cc2`,
`admin1_code`,
`admin2_code`,
`admin3_code`,
`admin4_code`,
`population`,
`elevation`,
`dem`,
`timezone`,
`last_modified_at`)
VALUES (?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?)"
,

$parts[0],
$parts[1],
$parts[2],
$parts[3],
$parts[4],
$parts[5],
$parts[6],
$parts[7],
$parts[8],
$parts[9],
$parts[10],
$parts[11],
$parts[12],
$parts[13],
$parts[14],
$parts[15],
$parts[16],
$parts[17],
$parts[18]

);


}

echo "done.\n";
exit;

The Database class is wrapper for mysqli, you can find it, along with the script above, in the geonames-allcountries-import bitbucket repo.

Note that this script will take a while to run (likely a few days) as there are 9,195,153 records that need to be inserted and we’re just doing simple INSERTs with no optimizations.

An overview of each of the fields in the database can be found in the GeoNames export readme.txt. Particularly important is the feature_class and feature_code fields, the range of values for which can be found on the GeoNames Feature Codes page. Also, as indicated in the readme, the data is licensed under the Creative Commons Attribution 3.0 License.

Round to midnight

A problem I’ve run into a few times is taking the current unix timestamp and rounding it to midnight, so that I can get the unix time for the start of the day. In PHP, I’ve commonly done the following:

$timestamp = strtotime('today midnight');

It’s one of the solutions presented in this StackOverflow post.

The solution above works fine, but I began thinking about how to actually do the computation and bypass the string parsing done by strtotime(). The computation is actually pretty simple, as it’s in the same vein as snapping a point to a grid. The verbose code snippet below shows the step-by-step process in the computation.

// Given the number of seconds in a day
$numSecondsInDay = 86400;

// .. and the current unix time
$currentTime = time();

// We can compute the number of days since the unix epoch (the decimal/fractional part is the portion of the current day that's elapsed)
$daysSinceEpoch = $currentTime / $numSecondsInDay;

// We can throw away the fractional part by rounding down with the floor() function
$wholeDaysSinceEpoch = floor($daysSinceEpoch);

// The number of whole days since the epoch x the number of seconds in a day will give the time for the current day at midnight
$midnightToday = $wholeDaysSinceEpoch * $numSecondsInDay;

One interesting thing to notice: if you replace the floor() function with the ceil() function, rounding up the number of days since the epoch, you’ll get the start of the next day – midnight tomorrow.

Wiktionary definitions database

Having a dictionary can be incredibly useful in software development, and forms the basis for a wide range of natural language processing applications. However, finding an open-source dictionary, one that can be easily parsed and used within applications, is incredibly difficult as there simply isn’t a lot of options available.

WordNet is one option I came across, but requires significant work parsing the WordNet ASCII database files or Prolog database files.

Wiktionary was the other viable option, and the one I went with. The Wiktionary XML dumps are available, but being a wiki, these files are likely even more difficult to parse than the WordNet database files as you’d have to deal with wiki markup. However, a while ago I was able to get a TSV file with words, parts of speech, and definitions from the Wikimedia Toolserver at http://toolserver.org/~enwikt/definitions. The Toolserver has since been discontinued and I haven’t found updated TSVs hosted anywhere else, but the file I downloaded, dated November 27, 2012, is still fairly up-to-date for a dictionary and useful in many applications.

I wrote a PHP script to parse the TSV and make INSERTs into a MySQL database. The TSV file, MySQL database, and PHP script are presented below.

Wiktionary TSV file

Wiktionary MySQL database export

PHP Script:

<?php

require "Database.php";

$tsvInputFilePath = "TEMP-E20121127.tsv";

echo "Importing {$tsvInputFilePath} ...\n";

// Open file
$fp = fopen($tsvInputFilePath, "r");
if($fp === FALSE) {
echo "Could not find file path: " . $tsvInputFilePath;
exit;
}

// Establish DB connection
$db = new Database();

while (!feof($fp)) {

// Get line and parse tab-delimited fields
$ln = fgets($fp);
$parts = explode("\t", $ln);
if(count($parts) < 4) {
continue;
}

$lang = $parts[0];
$word = $parts[1];
$partOfSpeech = $parts[2];
$definitionRaw = $parts[3];

// Insert into database
$db->query("INSERT INTO words (language, word, part_of_speech, definition_raw)
VALUES (?, ?, ?, ?)"
,
$lang, $word, $partOfSpeech, $definitionRaw);

}

echo "done.\n";
exit;

The Database class is wrapper for mysqli, you can find it, along with the script above, in the wiktionary-tsv-import bitbucket repo.

Note that definitions need to be parsed further, as they contain wiki markup. The parsing doesn’t seem difficult and is something I hope to get done in the near future.

Related resources:

There’s valuable stuff from each of the projects above, but like WordNet, requires significantly more time to evaluate and implement in an application, compared to the simple TSV -> MySQL translation.

EDIT (12/13/2015): I’ve updated the MySQL database export. There was some holes in the data because I was using utf8 column encoding for definitions, however, MySQL’s has a weird “UTF-8” implementation that only handles codepoint that up to 3 bytes in size. utf8mb4 encoding needs to be used for a proper UTF-8 encoding supporting up to 4 bytes.

PHP count() is O(1)

I was curious about the performance of PHP’s count() function a while back and whether it was worth it to store the result in a variable for repeated use. I discovered the following from this answer by FractalizeR on Stack Overflow:

PHP_FUNCTION(count) calls php_count_recursive(), which in turn calls zend_hash_num_elements() for non-recursive array, which is implemented this way:

ZEND_API int zend_hash_num_elements(const HashTable *ht)
{
IS_CONSISTENT(ht);

return ht->nNumOfElements;
}

So you can see, it’s O(1) for $mode = COUNT_NORMAL.

IMAP Pickup

An interesting little project I wanted to work on; I wanted to be able to pull attachments from emails in an IMAP mailbox and then download them. I wanted an IMAP solution instead of writing a script for the MTA as a script would be specific to the MTA software and not transferable to another server. In addition, there’s also the common case where you may simply not have access to the MTA.

The biggest help in putting this together and dealing with attachments was this blog post and this comment on the PHP docs. Information on doing this is a bit scattered and incomplete in many cases, likely because extracting attachments is somewhat difficult as email is a notoriously bad way to transfer files; the file data is base64 encoded and dumped in as part of the message body.

ImapPickup is the class that encapsulates all the necessary functionality,

class ImapPickup
{
protected $imapStream = null;

protected function findAttachments($part)
{
$partNum = -1;
$attachments = array();

$this->findAttachmentsRec($part, &$attachments, &$partNum, -1);

return $attachments;
}

protected function findAttachmentsRec($part, &$attachments, &$partNum, $partNumSub)
{
if (isset($part->parts))
{
foreach ($part->parts as $partOfPart)
{
$this->findAttachmentsRec($partOfPart, &$attachments, &$partNum, $partNumSub+1);
}
}
else
{

if (isset($part->disposition)){
if ($part->disposition == 'attachment') {
$attachments[] = array($part->dparameters[0]->value, $partNum, $partNumSub);
}
}
}

$partNum++;
}

public function getAttachmentContent($msgNum, $partNum)
{
$contents = imap_fetchbody($this->imapStream, $msgNum, $partNum, FT_UID);
return imap_base64($contents);
}

public function getAttachments($msgNum)
{
$struct = imap_fetchstructure($this->imapStream,$msgNum,FT_UID);
$attachments = $this->findAttachments($struct);

return $attachments;
}

public function getAttachmentsFromMessages($msgArray)
{
$msgIdToAttachmentsMap = array();

if ($msgArray)
{
foreach($msgArray as $msgId)
{
$attachments = $this->getAttachments($msgId);
if(!empty($attachments))
{
$msgIdToAttachmentsMap[$msgId] = $attachments;
}
}
}

return $msgIdToAttachmentsMap;
}

public function getMessages($searchQuery)
{
return imap_search($this->imapStream, $searchQuery, SE_UID);
}

public function connect($mailbox, $user, $password)
{
$this->imapStream = imap_open($mailbox, $user, $password);
}

public function disconnect()
{
imap_close(
$this->imapStream);
}

}

Here’s a little example of how it can be used. This will query all messages with “pickup::” in the subject line and print out the messageID of all messages with attachments, followed by the filenames of all attachments for that message.

$imapPickup = new ImapPickup();
$imapPickup->connect("{mail.hotspotdot.net:143}INBOX", "test@test.net", "pass123");

$messages = $imapPickup->getMessages("SUBJECT pickup::");
$attachments = $imapPickup->getAttachmentsFromMessages($messages);

foreach($attachments as $msgId => $attArr)
{
echo "<p>{$msgId} => ";

foreach($attArr as $attachment)
{
echo $attachment[0];
echo ",";
}

echo "</p>";
}

$imapPickup->disconnect();

The array for a single file attachment contains 3 entries:

  • [0] => filename
  • [1] => major part number
  • [2] => minor part number

getAttachments(), findAttachments(), and findAttachmentsRec() will return an array of such entries (or an empty array is there are no attachments). getAttachmentsFromMessages() will return a map from messageID => array of single attachments.

The part number (both major and minor) is needed to retrieve the contents of an attachment. For getAttachmentContent(), simply use the major number if the minor number is <= 0, or concatenate them with a period separating them (e.g. "2.3").

PostgreSQL + PHP installation on Windows 2003 x64

Well the PostgreSQL installation itself is easy enough, getting it to work with PHP is the challenging part. Here’s what I did:

Reflex Feedback widget

I worked on a small AJAX widget for user feedback built atop jQuery UI: Reflex Feedback. It’s inspired by the widgets you see from services like Get Satisfaction and UserVoice, but much simpler and it’s a frontend-only widget, how you handle the feedback info on the backend is up to you.

Here’s what it looks like.

reflex feedback widget dialog

And here’s what the tag that opens the dialog looks like:

reflex feedback widget tag

To use it, download or clone the ReflexFeedback repo from bitbucket

Place the .js file wherever you’d like but the /reflex.content folder should a subdirectory in the same folder as the page loading the .js file. Load reflex.js as you would any other javascript file:

<script type="text/javascript" src="js/reflex.js"></script>

Call Reflex.init() to add the widget to the page. The first argument is the DOM element to attach the additional HTML/CSS code to. The seconds argument is the server-side script to call when the user clicks Send Feedback.

Reflex.init($('body'), 'controller/post_feedback.php');

That’s it for the frontend. You should see the tag show up in the right-hand corner and when clicked the dialog open.

For the backend, the AJAX call to send the feedback info will send a POST request with 2 fields: feedback_type, feedback_txt.

Reflex expects an XML reply from the server:

<reflex>
<result>ok</result>
</reflex>

ok indicates a successful result, any other reply is considered an error.

A successful result will close the dialog and show another with a thank you message.

reflex feedback thank you dialog

For an error, a message is shown below the Send Feedback button, informing the user that an error has occurred and to try again.

reflex feedback send fail

As for what to actually do with the feedback, that’s up to you, but what I’m doing is sending myself an email with the feedback info. I’ve posted my PHP script below; feel free to use it, modify it, etc. If you do use this code, be sure to fill in your mail server credentials and a from address; you’ll also need PEAR’s Mail package installed.

<?php

require_once "Mail.php";
require_once "Mail/mime.php";

header('Content-type: application/xml; charset=utf-8');
echo "<?xml version=\"1.0\" encoding=\"utf-8\"?>\r\n";

if(!isset($_POST['feedback_type']) || !isset($_POST['feedback_txt']))
{
echo "<reflex><result>error:missing-arguments</result></reflex>";
}
else
{
$from = "...";
$to = "...";
$subject = "Feedback from user...";

$feedback_type = $_POST['feedback_type'];
$feedback_txt = $_POST['feedback_txt'];

$bodyHtml = "<html><body>";
$bodyHtml .= "<p>Type: {$feedback_type}</p>";
$bodyHtml .= "<p>Feedback: {$feedback_txt}</p>";
$bodyHtml .= "</body></html>";
$body = $bodyHtml;

$host = "...";
$port = "...";
$username = "...";
$password = "...";

$headers = array('MIME-Version' => '1.0rn',
'Content-type' => 'text/html; charset=utf-8',
'From' => $from, 'To' => $to, 'Subject' => $subject);


$smtp = Mail::factory('smtp',
array ('host' => $host,
'port' => $port,
'auth' => true,
'username' => $username,
'password' => $password));

$mail = $smtp->send($to, $headers, $body);

if (PEAR::isError($mail))
{
$err_details = $mail->getMessage();
echo "<reflex><result>error:send-failure</result><details>{$err_details}</details></reflex>";
}
else
{
echo "<reflex><result>ok</result></reflex>";
}
}

?>

That’s all for now. I’ll work on more features and options for customization in the future. You can see the widget in action over at dotspott.com

hotspotdot

hotspotdot is my entry into Microsoft’s My App is Better Challenge. In a nutshell, hotspotdot allows you to locate and tag “hot spots” that are important to you, allowing you to create a personal database of hot spots.

It’s a very simple app, utilizing PHP, SQL Server 2008, jQuery, jQuery UI, and the Google Maps API. I wrapped thing up in under 2 weeks, but in order to make the August 25th deadline there were a few features I didn’t get a chance to put in, such as preloaders, icons on the map, and searching by tag. I’ll probably devote some time to these in the coming weeks.

hotspotdot login

The goal of the contest was to create something showing off the power of SQL Server and Microsoft’s new PHP for SQL Server 2.0 drivers. I used the new PDO driver which was pretty sweet – very simple, elegant API. Two things in particular that impressed me were transactions (very nice for doing multiple inserts or deletions) and a consistent and well designed exception model (makes error handling much easier and especially powerful when combined with transactions as failed queries don’t effect the database [code jumps to exception handler before commit() is called], so no chance of junk being inserted). Error messages, in general, were also much more descriptive than those I’ve encountered with MySQL.

Finding a server proved difficult. I got a shared hosting solution, but the host was unable (though I suspect unwilling) to install the PHP for SQL drivers. So I ended up getting a virtual private server, but this only came with SQL Server Express and I used some features (see below) that prevented a migration. In order to avoid purchasing an SQL Server 2008 license (way out of my budget), my final solution was use the shared hosting server for the DB and the VPS for everything else. Hence the reason for the site being located at the my. subdomain, which maps to the IP of the VPS.

SQL Server 2008 is a fine system (despite some annoyances with the management studio)… though it’s really just a solid database system, which isn’t bad, but there’s nothing really impressive or creative about that. It’s not really leaps and bounds above a cheap solution like MySQL. The contest was about SQL Server, so I tried to do something that utilized a fairly unique aspect of the system: the geography data type. I wasn’t too impressed. It simply holds a (longitude, latitude) pair in a certain format, nothing more. The one big advantage of having a vector type like this would be doing comparisons based on distance but, as far as I could tell, this isn’t supported (my queries failed). Worse yet, in SQL Server Management Studio the display of the geography type is in hexadecimal, making things very cryptic… I couldn’t help but wonder why this is any better than using 2 columns and storing the longitude and latitude values independently. That said, the idea of richer/more-complex data types within a relational database is a pretty cool idea and it would interesting to see it taken further and beyond its current, primitive state.

hotspotdot map

hotspotdot map

Leave a comment if you find a bug or have any questions, comments, etc.

PHP session_start() “Node no longer exists”

I stumbled upon this error earlier today as I attempted to store the value of a SimpleXMLElement as a session variable. I was able to narrow down the issue thanks to this post on bytes.com.

According to a user post on the PHP site, this occurs because SimpleXML returns a reference to an object containing the node value, and you can’t store a reference as a session variable. The value must be dereferenced and copied which can be done by casting.

// Bad! $storageBoxSize = $xml->data->storage_box_size;

// Good! $storageBoxSize = (int)$xml->data->storage_box_size;

I find myself hating loosely-typed languages more and more.

PHP Array of MIME Types

Useful for when you need to set HTTP headers when serving file downloads. The mime_content_type function is deprecated and in my case was just returning an empty string. The recommended alternative, using PECL finfo_file, should work fine, but adding and compiling in a PECL extension just for this seems like overkill, especially as you have more control using an array. I’m also never crazy about adding dependencies unless they’re absolutely necessary.

This is from snipplr, but includes the image/png type which was, curiously, missing.

$mime_types = array(
"323" => "text/h323",
"acx" => "application/internet-property-stream",
"ai" => "application/postscript",
"aif" => "audio/x-aiff",
"aifc" => "audio/x-aiff",
"aiff" => "audio/x-aiff",
"asf" => "video/x-ms-asf",
"asr" => "video/x-ms-asf",
"asx" => "video/x-ms-asf",
"au" => "audio/basic",
"avi" => "video/x-msvideo",
"axs" => "application/olescript",
"bas" => "text/plain",
"bcpio" => "application/x-bcpio",
"bin" => "application/octet-stream",
"bmp" => "image/bmp",
"c" => "text/plain",
"cat" => "application/vnd.ms-pkiseccat",
"cdf" => "application/x-cdf",
"cer" => "application/x-x509-ca-cert",
"class" => "application/octet-stream",
"clp" => "application/x-msclip",
"cmx" => "image/x-cmx",
"cod" => "image/cis-cod",
"cpio" => "application/x-cpio",
"crd" => "application/x-mscardfile",
"crl" => "application/pkix-crl",
"crt" => "application/x-x509-ca-cert",
"csh" => "application/x-csh",
"css" => "text/css",
"dcr" => "application/x-director",
"der" => "application/x-x509-ca-cert",
"dir" => "application/x-director",
"dll" => "application/x-msdownload",
"dms" => "application/octet-stream",
"doc" => "application/msword",
"dot" => "application/msword",
"dvi" => "application/x-dvi",
"dxr" => "application/x-director",
"eps" => "application/postscript",
"etx" => "text/x-setext",
"evy" => "application/envoy",
"exe" => "application/octet-stream",
"fif" => "application/fractals",
"flr" => "x-world/x-vrml",
"gif" => "image/gif",
"gtar" => "application/x-gtar",
"gz" => "application/x-gzip",
"h" => "text/plain",
"hdf" => "application/x-hdf",
"hlp" => "application/winhlp",
"hqx" => "application/mac-binhex40",
"hta" => "application/hta",
"htc" => "text/x-component",
"htm" => "text/html",
"html" => "text/html",
"htt" => "text/webviewhtml",
"ico" => "image/x-icon",
"ief" => "image/ief",
"iii" => "application/x-iphone",
"ins" => "application/x-internet-signup",
"isp" => "application/x-internet-signup",
"jfif" => "image/pipeg",
"jpe" => "image/jpeg",
"jpeg" => "image/jpeg",
"jpg" => "image/jpeg",
"js" => "application/x-javascript",
"latex" => "application/x-latex",
"lha" => "application/octet-stream",
"lsf" => "video/x-la-asf",
"lsx" => "video/x-la-asf",
"lzh" => "application/octet-stream",
"m13" => "application/x-msmediaview",
"m14" => "application/x-msmediaview",
"m3u" => "audio/x-mpegurl",
"man" => "application/x-troff-man",
"mdb" => "application/x-msaccess",
"me" => "application/x-troff-me",
"mht" => "message/rfc822",
"mhtml" => "message/rfc822",
"mid" => "audio/mid",
"mny" => "application/x-msmoney",
"mov" => "video/quicktime",
"movie" => "video/x-sgi-movie",
"mp2" => "video/mpeg",
"mp3" => "audio/mpeg",
"mpa" => "video/mpeg",
"mpe" => "video/mpeg",
"mpeg" => "video/mpeg",
"mpg" => "video/mpeg",
"mpp" => "application/vnd.ms-project",
"mpv2" => "video/mpeg",
"ms" => "application/x-troff-ms",
"mvb" => "application/x-msmediaview",
"nws" => "message/rfc822",
"oda" => "application/oda",
"p10" => "application/pkcs10",
"p12" => "application/x-pkcs12",
"p7b" => "application/x-pkcs7-certificates",
"p7c" => "application/x-pkcs7-mime",
"p7m" => "application/x-pkcs7-mime",
"p7r" => "application/x-pkcs7-certreqresp",
"p7s" => "application/x-pkcs7-signature",
"pbm" => "image/x-portable-bitmap",
"pdf" => "application/pdf",
"pfx" => "application/x-pkcs12",
"pgm" => "image/x-portable-graymap",
"pko" => "application/ynd.ms-pkipko",
"pma" => "application/x-perfmon",
"pmc" => "application/x-perfmon",
"pml" => "application/x-perfmon",
"pmr" => "application/x-perfmon",
"pmw" => "application/x-perfmon",
"png" => "image/png",
"pnm" => "image/x-portable-anymap",
"pot" => "application/vnd.ms-powerpoint",
"ppm" => "image/x-portable-pixmap",
"pps" => "application/vnd.ms-powerpoint",
"ppt" => "application/vnd.ms-powerpoint",
"prf" => "application/pics-rules",
"ps" => "application/postscript",
"pub" => "application/x-mspublisher",
"qt" => "video/quicktime",
"ra" => "audio/x-pn-realaudio",
"ram" => "audio/x-pn-realaudio",
"ras" => "image/x-cmu-raster",
"rgb" => "image/x-rgb",
"rmi" => "audio/mid",
"roff" => "application/x-troff",
"rtf" => "application/rtf",
"rtx" => "text/richtext",
"scd" => "application/x-msschedule",
"sct" => "text/scriptlet",
"setpay" => "application/set-payment-initiation",
"setreg" => "application/set-registration-initiation",
"sh" => "application/x-sh",
"shar" => "application/x-shar",
"sit" => "application/x-stuffit",
"snd" => "audio/basic",
"spc" => "application/x-pkcs7-certificates",
"spl" => "application/futuresplash",
"src" => "application/x-wais-source",
"sst" => "application/vnd.ms-pkicertstore",
"stl" => "application/vnd.ms-pkistl",
"stm" => "text/html",
"svg" => "image/svg+xml",
"sv4cpio" => "application/x-sv4cpio",
"sv4crc" => "application/x-sv4crc",
"t" => "application/x-troff",
"tar" => "application/x-tar",
"tcl" => "application/x-tcl",
"tex" => "application/x-tex",
"texi" => "application/x-texinfo",
"texinfo" => "application/x-texinfo",
"tgz" => "application/x-compressed",
"tif" => "image/tiff",
"tiff" => "image/tiff",
"tr" => "application/x-troff",
"trm" => "application/x-msterminal",
"tsv" => "text/tab-separated-values",
"txt" => "text/plain",
"uls" => "text/iuls",
"ustar" => "application/x-ustar",
"vcf" => "text/x-vcard",
"vrml" => "x-world/x-vrml",
"wav" => "audio/x-wav",
"wcm" => "application/vnd.ms-works",
"wdb" => "application/vnd.ms-works",
"wks" => "application/vnd.ms-works",
"wmf" => "application/x-msmetafile",
"wps" => "application/vnd.ms-works",
"wri" => "application/x-mswrite",
"wrl" => "x-world/x-vrml",
"wrz" => "x-world/x-vrml",
"xaf" => "x-world/x-vrml",
"xbm" => "image/x-xbitmap",
"xla" => "application/vnd.ms-excel",
"xlc" => "application/vnd.ms-excel",
"xlm" => "application/vnd.ms-excel",
"xls" => "application/vnd.ms-excel",
"xlt" => "application/vnd.ms-excel",
"xlw" => "application/vnd.ms-excel",
"xof" => "x-world/x-vrml",
"xpm" => "image/x-xpixmap",
"xwd" => "image/x-xwindowdump",
"z" => "application/x-compress",
"zip" => "application/zip" );