HTML

Posts Tagged ‘HTML’

Improving on strip_tags (part 2)

Feb 26 2023 · PHP

Whitespace and tags

Previously, I looked at improving the functionality of strip_tags such that words across tags are not mashed together. The method I derived works well enough but it’s limited in that all tags are treated the same way and all whitespace separators are the same. I wanted to see if I could improve the method a bit more to address these limitations; that is, introducing whitespace based on the type of tag encountered instead of injecting whitespace after stripping away a tag.

For example, when dealing with inline tags, whitespace should be preserved:

This bit of HTML:
<span>the quick brown fox </span><span>jumped over the moon</span>
… should produce:
the quick brown fox jumped over the moon

Alternatively, when dealing with block-level tags, a newline should be injected:

This bit of HTML:
<div>the quick brown fox</div><div>jumped over the moon</div>
… should produce:
the quick brown fox jumped over the moon

Note that we’re simply talking about common/expected browser behavior from what’s thought of as inline-level or block-level tags. In reality, this categorization isn’t really part of the HTML standard anymore and layout behavior is relegated determined by CSS. From MDN:

That said, when looking at arbitrary HTML content, I still think “block” vs. “inline” is a useful distinction, at least insofar as inferring default or common behavior.

The special case

The <br> tag presents a special case. While it’s classified as an inline element, <br> represents whitespace that is generally similar to that of a block-level element (e.g. a newline). In implementation this is simple to handle but does introduce a tiny bit of additional complexity.

Looking at the high-level transformations needed, we get the following:

Inline-level tags → strip away (no action needed, don’t alter any existing whitespace within tag contents)
Block-level tags → strip away, replace with newline
<br> tags → strip away, replace with newline

Code

Reworking the convert() method from the previous post, we get the following:

class HTMLToPlainText
{
    const BLOCK_LEVEL_ELEMENTS = [
        "address",
        "article",
        "aside",
        "blockquote",
        "details",
        "dialog",
        "dd",
        "div",
        "dl",
        "dt",
        "fieldset",
        "figcaption",
        "figure",
        "footer",
        "form",
        "h1",
        "h2",
        "h3",
        "h4",
        "h5",
        "h6",
        "header",
        "hgroup",
        "hr",
        "li",
        "main",
        "nav",
        "ol",
        "p",
        "pre",
        "section",
        "table",
        "ul"
    ];

    const INLINE_LEVEL_ELEMENTS_THAT_PRODUCE_NEWLINE = [
        "br",
    ];

    const STATE_READING_CONTENT = 1;
    const STATE_READING_TAG_NAME = 2;

    static public function convert(string $input, string $blockContentSeparator = "\n"): string
    {
        // the input string as UTF-32
        $fixedWidthString = iconv('UTF-8', 'UTF-32', $input);

        // string within tags that we've found
        $output = "";

        // buffer for current/last tag name read
        $currentTagName = "";
        $currentTagIsClosing = null;

        // buffer content in the current tag being read
        $contentInCurrentTag = "";

        // flag to indicate how we should interpret what we're reading from $fixedWidthString
        // .. this is initially set to STATE_READING_CONTENT, as we assume we're reading content from the start, even
        // if we haven't encountered a tag (e.g. string that doesn't contain tags)
        $parserState = self::STATE_READING_CONTENT;

        $flushCurrentToOutput = function() use (&$output, &$contentInCurrentTag, &$currentTagName, &$currentTagIsClosing, &$blockContentSeparator) {
            // handle inline tags, which produce a newline (e.g. <br>)
            // .. not that these can be empty (<br>) or self-closing (<br/>)
            if(in_array(strtolower($currentTagName), self::INLINE_LEVEL_ELEMENTS_THAT_PRODUCE_NEWLINE)) {
                $output .= $contentInCurrentTag . $blockContentSeparator;
            } else {
                // append $blockContentSeparator if we're at the *opening or closing* of a block-level element
                // (for inline element, leave content as-is)
                if (in_array(strtolower($currentTagName), self::BLOCK_LEVEL_ELEMENTS)) {
                    $output .= $contentInCurrentTag . $blockContentSeparator;
                } else {
                    $output .= $contentInCurrentTag;
                }
            }

            // reset
            $contentInCurrentTag = "";
            $currentTagIsClosing = null;
            $currentTagName = "";
        };

        // iterate through characters in $fixedWidthString
        // checking for tokens indicating if we're within a tag or within content
        for($i=0; $i<strlen($fixedWidthString); $i+=4) {
            // convert back to UTF-8 to simplify character/token checking
            $ch = iconv('UTF-32', 'UTF-8', substr($fixedWidthString, $i, 4));

            if($ch === '<') {
                $flushCurrentToOutput();
                $parserState = self::STATE_READING_TAG_NAME;
                continue;
            }

            if($ch === '>') {
                $flushCurrentToOutput();
                $parserState = self::STATE_READING_CONTENT;
                continue;
            }

            if($parserState == self::STATE_READING_TAG_NAME && $ch == '/') {
                $currentTagIsClosing = true;
                continue;
            }

            if($parserState == self::STATE_READING_TAG_NAME) {
                $currentTagName .= $ch;
                continue;
            }

            if($parserState === self::STATE_READING_CONTENT) {
                $contentInCurrentTag .= $ch;
                continue;
            }
        }

        $flushCurrentToOutput();

        return trim($output, $blockContentSeparator);
    }
}

Testing

Throwing some arbitrary bits of HTML at this function seems to indicate that the method works correctly but, a method like this, really calls for some form of automated testing. I could derive test cases from the function logic, and this is what’s typically done when testing some arbitrary method, but this approach is biased and limited here. Biased in that I’d be looking at the function and coming up with test cases based upon my experiences (what I’ve encountered and where I think there may be potential issues). Limited in that I’d likely only come up with a handful of test cases unless I invested a significant chunk of time into compiling a comprehensive set of cases; HTML has relatively few building blocks but, given the number of different ways those blocks can be combined and arranged, we end up with a fairly large number of permutations. What would really be effective here is testing with a large and varied corpus of test cases, mappings of HTML snippets to plain text representations; i.e. data-driven testing. It’s usually hard to generate or find data for such testing but the PHP repository has a number of test cases for strip_tags() that can be leveraged:

strip_tags_basic1.phpt has some good baseline tests (HTML tags, PHP tags, tags with attributes, HTML comments, etc.)
strip_tags_basic2.phpt has a good test case (different tags + mix of block and inline elements + PHP tags) but is really testing the allowed_tags_array argument to strip_tags(), which I forgot was a thing and didn’t consider in my method

Beyond the test cases in these 2 files, there are other good cases scattered in the repo, seemingly tied to specific bugs encountered (e.g. bug #53319, which involves handling of “<br />” tags) but they can be hard to locate given the organization or lack thereof of the test files. In any case, it’s great having this data to work with and there were some issues that surfaced when I began subjecting my code to some of these test (e.g. the content separator for block-level elements needing to be attended at the point of both the opening and closing tags, not just the closing tag).

Implementation-wise, testing is mainly encoding the test case in a map and assert that the actual result matches expectations:

$testCases = [
    "<html>hello</html>" => "hello",
    "<?php echo hello ?>" => "",
    "<? echo hello ?>" => "",
    "<% echo hello %>" => "",
    "<script language=\"PHP\"> echo hello </script>" => " echo hello ",
    "<html><b>hello</b><p>world</p></html>" => "hello\nworld",
    "<html><!-- COMMENT --></html>" => "",
    "<html><p>hello</p><b>world</b><a href=\"#fragment\">Other text</a></html><?php echo hello ?>" => "hello\nworldOther text",
    "<p>hello</p><p>world</p>" => "hello\n\nworld",
    '<br /><br />USD<input type="text"/><br/>CDN<br><input type="text" />' => "USD\nCDN",
];

foreach ($testCases as $html => $expectedPlainText) {
    $actualPlainText = HTMLToSearchableText::convert_ex($html);

    echo "TEST: " . $html . "\n";
    echo "EXPECTED: " . $expectedPlainText . "\n";
    echo "ACTUAL: " . $actualPlainText . "\n";
    echo "----\n";

    assert($actualPlainText === $expectedPlainText);
}

Testing is still limited here. I’ve love to simply have a large batch of test cases to throw at the function but something like that is not readily available.

Limitations / future work

The new convert() method is more robust but there’s still some key limitations when compared to the strip_tags() function:

PHP’s strip_tags() is actually a lot more robust when it comes to invalid/malformed HTML content, as the tests in strip_tags.phpt demonstrate
Preserving certain tags (as with the allowed_tags_array argument) wasn’t considered

Also, whitespace/separators produced from <br> elements at the beginning or end of any inputted HTML is stripped away. I don’t think this is correct as browsers preserve whitespace from <br> elements and don’t collapse them as with empty block-level elements.

block-leveldata-driven testingHTMLinline-levelPHPsoftware testingstrip_tagsunit testing

Improving on strip_tags

Aug 13 2022 · PHP

The Problem

PHP’s strip_tags() method will strip away tags but makes no attempt to introduce whitespace to separate content in adjacent tags. This is an issue with arbitrary HTML as adjacent block-level elements may not have any intermediate whitespace and simply stripping away the tags will incorrectly concatenate the textual content in the 2 elements.

For example, running strip_tags() on the following:

<div>the quick brown fox</div><div>jumped over the moon</div>

… will return:

the quick brown foxjumped over the moon

This is technically correct (we’re stripped away the <div> tags) but having no whitespace between “fox” and “jumped” means we’ve transformed the content such that we’ve lost semantic and presentational details.

The Solution

There’s 2 ways I can see to fix this behavior:

Pre-process the HTML content to ensure or introduce whitespace between block-level elements
Don’t use strip_tags() and utilize a method that better understands the need for spacing between elements

I’ll focus on the latter because that’s the avenue I went down and I didn’t consider pre-processing at the time.

Pulling together a quick-and-dirty parser, I wrote the following. It’s worth noting that still still doesn’t really consider what the tags are (e.g. whether they’re inline or block) but allows the caller to specify a string ($tagContentSeparator), typically some whitespace, that is inserted between the stripped away tags:

<?php

class HTMLToPlainText
{
    const STATE_READING_CONTENT = 1;
    const STATE_READING_TAG_NAME = 2;

    static public function convert(string $input, string $tagContentSeparator = " "): string
    {
        // the input string as UTF-32
        $fixedWidthString = iconv('UTF-8', 'UTF-32', $input);

        // string within tags that we've found
        $foundContentStrings = [];

        // buffer for current content being read
        $currentContentString = "";

        // flag to indicate how we should interpret what we're reading from $fixedWidthString
        // .. this is initially set to STATE_READING_CONTENT, as we assume we're reading content from the start, even
        // if we haven't encountered a tag (e.g. string that doesn't contain tags)
        $parserState = self::STATE_READING_CONTENT;

        // method to add a non-empty string to $foundContentStrings and reset $currentContentString
        $commitCurrentContentString = function() use (&$currentContentString, &$foundContentStrings) {
            if(strlen($currentContentString) > 0) {
                $foundContentStrings[] = trim($currentContentString);
                $currentContentString = "";
            }
        };

        // iterate through characters in $fixedWidthString
        // checking for tokens indicating if we're within a tag or within content
        for($i=0; $i<strlen($fixedWidthString); $i+=4) {
            // convert back to UTF-8 to simplify character/token checking
            $ch = iconv('UTF-32', 'UTF-8', substr($fixedWidthString, $i, 4));

            if($ch === '<') {
                $parserState = self::STATE_READING_TAG_NAME;
                $commitCurrentContentString();
                continue;
            }

            if($ch === '>') {
                $parserState = self::STATE_READING_CONTENT;
                continue;
            }

            if($parserState === self::STATE_READING_CONTENT) {
                $currentContentString .= $ch;
                continue;
            }
        }

        $commitCurrentContentString();

        return implode($tagContentSeparator, $foundContentStrings);
    }
}

Note that the to/from UTF-8 ↔ UTF-32 isn’t really necessary, I initially did the conversion as I was worried about splitting a multibyte character, but this isn’t possible given how the function reads the input string.

Now if we take the following HTML snippet:

<div>the quick brown fox</div><div>jumped over the moon</div>

… rendered in a browser, we get:

… with strip_tags() we get:

the quick brown foxjumped over the moon

… and with HTMLToPlainText::convert() (passing in “\n” for $tagContentSeparator), we get:

the quick brown fox
jumped over the moon

The latter results in text that is semantically correct, as words in different blocks aren’t incorrectly joined. Presentationally we also get a more correct conversion but, the method isn’t really doing anything fancy here, this is due to the calling knowing a bit about the HTML snippet, how a browser would render it, and passing passing in “\n” for $tagContentSeparator.

Limitations / future work

The improvement here is that textual content is pretty preserved when doing a conversion, i.e. we don’t have to worry about textual elements being incorrectly concatenated. However, what I wrote is still lacking in 2 keys areas:

Generally, in terms of presentation, an arbitrary bit of HTML won’t map to what a user sees in a browser. To a certain degree this is an intractable problem, as presentation is based on browser defaults, CSS styles, etc. Also, there are things that simply don’t have a standard representation in plain-text (e.g. bold text, list items, etc.). However, there are cases where sensible defaults might make sense, e.g. stripping away <span> tags but putting newline between <p> tags.
Whitespace is trimmed from content within tags. This may or may not matter depending on application. In my case, I cared about the words and additional whitespace just added bloat even if it was more accurate to what was in the HTML.

EDIT: See part 2 on addressing these limitations and making the code more robust.

HTMLPHPstrip_tags

Rendering HTML to images with SVG foreignObject

Sep 4 2019 · Web Technologies

Motivation

For applications that allow users to create visual content, being able to generate images of their work can be important in a number of scenarios: preview/opengraph images, allowing users to display content elsewhere, etc. This popped up as a need for ScratchGraph and led me to research a few possible solutions. Using the SVG <foreignObject> element was one of the more interesting solutions I came across, as all rendering and image creation is done client-side.

<foreignObject> to Image

<foreignObject> is a somewhat strange element. Essentially, it allows you to load and render arbitrary HTML content within SVG. This in and of itself isn’t helpful for generating an image, but we can take advantage of two other aspects of modern browsers to make this a reality:

SVG markup can be dynamically loaded into an Image by transforming the markup into a data URL
Data URL length limits are no longer a concern. We no longer have the kilobyte-scale limits we were dealing with a few years ago

Sketching it out, the process looks something like this (contentHtml is a string with the HTML content we want to render):

The code for this is pretty straightforward:

// build SVG string
const svg = `
    <svg xmlns='http://www.w3.org/2000/svg' width='${width}' height='${height}'>
        <foreignObject x='0' y='0' width='${width}' height='${height}'>
            ${contentHtml}
        </foreignObject>
    </svg>`;

// convert SVG to data-uri
const dataUri = `data:image/svg+xml;base64,${window.btoa(svg)}`;

Here I’m assuming contentHtml is valid and can be trusted. If that’s not the case, you’ll likely need some pre-processing steps before sticking it into a string like this.

The code above works, to a degree; there’s a few key limitations to be aware of:

Cross-origin images served without CORS headers won’t load within <foreignObject>
Styles declared via stylesheets do not pass through to the contents of <foreignObject>
External resources (images, fonts, etc.) won’t be in the generated Image, as the browser doesn’t wait for these resources to be loaded before rendering out the image

The cross-origin issue may be annoying and unexpected (as the browser does load these images), but it’s a valid security measure and CORS provides the mechanism around it.

Handling stylesheets and external resources are more important concerns, and addressing them allows for a much more robust process.

Handling stylesheets

This isn’t anything too fancy, here are the steps involved:

Copy all the style rules, from all the stylesheets, in the parent document
Wrap all those rules in a <style> tag
Prepend that string to the contentHtml string

The code for this precursor step looks something like this:

const styleSheets = document.styleSheets;
let cssStyles = "";
let urlsFoundInCss = [];

for (let i=0; i<styleSheets.length; i++) {
    for(let j=0; j<styleSheets[i].cssRules.length; j++) {
        const cssRuleStr = styleSheets[i].cssRules[j].cssText;
        cssStyles += cssRuleStr;
    }
}

const styleElem = document.createElement("style");
styleElem.innerHTML = cssStyles;
const styleElemString = new XMLSerializer().serializeToString(styleElem);

...

contentHtml = styleElemString + contentHtml;

...

Handling external resources

My solution here is somewhat curd, but it’s functional.

Find url values in the CSS code or src attribute values in the HTML code
Make XHR requests to get these resources
Encode the resources as Base64 and construct data URLs
Replace the original URLs (in the CSS url or HTML src) with the new base64 data URLs

The following shows how this is done for the HTML markup (the process is only slightly different for CSS).

const escapeRegExp = function(string) {
    return string.replace(/[.*+?^${}()|[\]\\]/g, '\\$&'); // $& means the whole matched string
};

let urlsFoundInHtml = getImageUrlsFromFromHtml(contentHtml);
const fetchedResources = await getMultipleResourcesAsBase64(urlsFoundInHtml);
for(let i=0; i<fetchedResources.length; i++) {
    const r = fetchedResources[i];
    contentHtml = contentHtml.replace(new RegExp(escapeRegExp(r.resourceUrl),"g"), r.resourceBase64);
}

The getImageUrlsFromFromHtml() and parseValue() methods that extract the value of src attributes from elements:

/**
 * 
 * @param {String} str 
 * @param {Number} startIndex 
 * @param {String} prefixToken 
 * @param {String[]} suffixTokens
 * 
 * @returns {String|null} 
 */
const parseValue = function(str, startIndex, prefixToken, suffixTokens) {
    const idx = str.indexOf(prefixToken, startIndex);
    if(idx === -1) {
        return null;
    }

    let val = '';
    for(let i=idx+prefixToken.length; i<str.length; i++) {
        if(suffixTokens.indexOf(str[i]) !== -1) {
            break;
        }

        val += str[i];
    }

    return {
        "foundAtIndex": idx,
        "value": val
    }
};

/**
 * 
 * @param {String} str
 * @returns {String}
 */
const removeQuotes = function(str) {
    return str.replace(/["']/g, "");
};

/**
 * 
 * @param {String} html 
 * @returns {String[]}
 */
const getImageUrlsFromFromHtml = function(html) {
    const urlsFound = [];
    let searchStartIndex = 0;

    while(true) {
        const url = parseValue(html, searchStartIndex, 'src=', [' ', '>', '\t']);
        if(url === null) {
            break;
        }

        searchStartIndex = url.foundAtIndex + url.value.length;
        urlsFound.push(removeQuotes(url.value));
    }

    return urlsFound;
};

The getMultipleResourcesAsBase64() and getResourceAsBase64() methods responsible for fetching resources:

/**
 * 
 * @param {String} url 
 * @returns {Promise}
 */
const getResourceAsBase64 = function(url) {
    return new Promise(function(resolve, reject) {
        const xhr = new XMLHttpRequest();
        xhr.open("GET", url);
        xhr.responseType = 'blob';

        xhr.onreadystatechange = async function() {
            if(xhr.readyState === 4 && xhr.status === 200) {
                const resBase64 = await binaryStringToBase64(xhr.response);
                resolve(
                    {
                        "resourceUrl": url,
                        "resourceBase64": resBase64
                    }
                );
            }
        };

        xhr.send(null);
    });
};

/**
 * 
 * @param {String[]} urls 
 * @returns {Promise}
 */
const getMultipleResourcesAsBase64 = function(urls) {
    const promises = [];
    for(let i=0; i<urls.length; i++) {
        promises.push( getResourceAsBase64(urls[i]) );
    }
    return Promise.all(promises);
};

More code

The code for this experiment is up on Github. Most functionality is encapsulated with the ForeignHtmlRenderer method, which contains the code shown in this post.

Other Approaches

Similar (same?) approach with dom-to-image
This library also uses the <foreignObject> element and an approach similar to what I described in this post. I played around with it briefly and remember running to a few issues, but I didn’t keep the test code around and don’t remember what the errors were.
Server-side/headless rendering with puppeteer
This seems to be the defacto solution and, honestly, it’s a pretty good solution. It’s not too difficult to get it up and running as a service, though there will be an infrastructure cost. Also, I’d be willing to bet this is what services like URL2PNG use on their backend.
Client-side rendering with html2canvas
This is a really cool project that will actually parse the DOM tree + CSS and render the page (it’s a rendering engine done in client-side javascript). Unfortunately, only a subset of CSS is supported and SVG is not supported.

ajaxCSSdata uriDOM-to-imageforeignObjectHTMLhtml2canvasHTMLImageElementjavascriptpuppeteerSVGurl2png

Moving the caret to the end of text in an <input> element

May 18 2014 · Web Technologies

Very simple, and the following will work in all modern browsers.

<input name="url" type="text" value="http://" />

Javascript

var inputElem = document.getElementsByName("url")[0];
                
var valLen = inputElem.value.length;

inputElem.selectionStart = valLen;
inputElem.selectionEnd = valLen;   

inputElem.focus();

The same technique will work for <textarea> elements as well.

HTMLjavascriptjavascript selectionmoving caret

Manipulating text relative to the caret in a contenteditable div

Apr 20 2014 · Web Technologies

I wanted to play around a bit with dynamically modifying text as you type. The following is a simple auto-correct demo that makes use of the Selection and Range interfaces to replace text (read: text preceding the caret) within a contenteditable div.

$(document).on('keydown', '.ia-txt', function (e) {
            
    // check if space bar was hit
    if(e.keyCode == 32) {
                    
        // we'll check for the string "hwat"; incorrect form of "what"
        var incorrectTxt = "hwat";
    
        // Get selection and range based on position of caret
        // (we assume nothing is selected, and range points to the position of the caret)
        var sel = window.getSelection();  
        var range = sel.getRangeAt(0);   
                            
        // check that we have at least incorrectTxt.length characters in our container
        if(range.startOffset - incorrectTxt.length >= 0) {
        
            // clone the range, so we can alter the start and end
            var clone = range.cloneRange();                
            
            // alter start and end of cloned ranged, so it selects incorrectTxt.length characters
            clone.setStart(range.startContainer, range.startOffset - incorrectTxt.length); 
            clone.setEnd(range.startContainer, range.startOffset);        

            // get contents of cloned range
            var contents = clone.toString();                    
                                    
            // check if the contents of the cloned range is equal to our incorrectTxt string
            if(contents == incorrectTxt) {
                                        
                // delete the contents of the range ("hwat")
                clone.deleteContents();    
                
                // create a text node with the corrected text ("what") and insert it where we deleted the incorrect text
                var txtNode = document.createTextNode("what");
                range.insertNode(txtNode);
                
                // set the start of the range after the inserted node, so we have the caret after the inserted text
                range.setStartAfter(txtNode);
                                    
                // Chrome fix
                sel.removeAllRanges();
                sel.addRange(range);               

            }                
        }
    }
    

});

You can see the code in action in the frame below. Every time you press the space-bar and the string “hwat” is detected, preceding the position of the caret, it is removed and replaced with the string “what”:

This is an incredibly trivial example (note that it doesn’t even check that the string “hwat” is surrounded by whitespace on both sides), but it does serve as a template for more advanced functionality. That said, be very aware of minor differences in the behavior of Range methods when working across browsers, I’ve stumbled across a few:

The code above breaks under certain conditions in Internet Explorer. If you move the caret to a position between 2 words, type “hwat” + space (the string is auto-corrected to “what”), then type “hwat” + space again, the auto-correct doesn’t work. The range.startOffset variable seems incorrect (too small) and subtracting incorrectTxt.length (4) yields a negative start offset.
Using a keyup event instead of a keydown event, and checking for the string “hwat ” instead yields different behaviors in Firefox and Chrome. Firefox preserves the space after the corrected string, and the caret is at the position after the space. However, Chrome strips the space and the caret is after the corrected string.
After the selection’s range is altered after auto-correcting, Chrome requires the removeAllRanges(), addRange() calls to replace the selection’s range, but Firefox does not.

contenteditableHTMLjavascriptjavascript rangejavascript selection

Interaction classes – seperating CSS styles from Javascript interactions

Nov 19 2013 · Application Design

Something I’ve been doing for a while in my web development work is applying separate classes, interaction classes, to DOM elements that interact with Javascript. Basically, an interaction class is applied to any DOM element touched by Javascript code – an element bound to an event handler, an input element with it’s value being read or written, an element selected for animation, etc. The goal being to de-couple styling from interaction, allowing style changes to not interfere with JS code, and vice-versa.

Below is a bit of code to demonstrate what I’m talking about. As a convention, I apply a “ia-” prefix to my interaction classes.

<a href="#" class="btn-primary ia-begin-testing">Begin Testing</a>

btn-primary has the CSS rules for styling the anchor as a button
ia-begin-testing is bound to a JS event that triggers some arbitrary “begin testing” action

If the future, if I want to change the button to a link (remove the btn-primary class), change it to a secondary button (btn-primary to btn-secondary), or change styling in any other way, the Javscript code is unaffected and requires no changes.

In addition, the ia-begin-testing class can also be applied to other elements (another button, a link, an anchor wrapping an image, etc.) and is automatically bound to the same interaction functions, without writing additional JS DOM selection code. The ia-begin-testing class can also be removed, or changed to another interaction class, and the styling on the button remains the same.

While IDs and data attributes are also good choices for architecting this sort of style/interaction separation, I like classes for 2 reasons:

Selection via class is relatively fast across all browsers compared to selection via data attributes
Compared to IDs, classes can be re-used allowing the same interactions to be shared by multiple elements (e.g. a button and a link can both trigger the same function)

One of the reasons I wrote this post is as an alternative to the the “grouping of selectors” approach presented in Chris Coyier’s Can You “Over Organize” JavaScript? article. With interaction classes, there’s little need for grouping selectors. Aside from the benefit of de-coupling styling and interaction, you get the advantage of a single class (ia-whatever), on whatever and however many elements, mapping all said elements to their necessary JS functions. With grouping of selectors, some sanity is brought to the scattered DOM selection code, but you’re left with the burden of maintaining a pool of different element IDs, classes, etc.; a chore that only gets harder as the codebase grows and changes.

couplingCSSHTMLinteraction classesjavascript

autocomplete=”off”

Apr 23 2011 · Web Design

Something I haven’t thought about much, but very important: for sensitive information, turn off autocomplete on input tags.

<input type="text" name="super-secret-pin-num" autocomplete="off" />

It’s a non-standard attribute, but all the major browsers implement it (including Webkit/Safari).

h/t Pete Freitag

autocompleteHTMLinputsecurity

progTools 1.7

Mar 18 2011 · Adobe Air

Another update to progTools. Ignore the jump in version number, that’s a result of fighting with the Air ApplicationUpdater.

New features:

Auto-update
Encode/Decode URI
Escape string
Encode/Decode HTML special entities
Text character and word counter

[airbadge]progTools, http://aautar.digital-radiation.com/apps/progTools-1.7.air,1.7,http://aautar.digital-radiation.com/progTools-air-badge/logo-badge.png[/airbadge]

progTools 1.7

air.update.ApplicationUpdaterUIdecodeURIencodeURIescape stringHTMLhtml special entitieshtmlentitiesprogtools

Rtf2Html 1.22

Feb 4 2011 · Random

New stuff:

UI tweaks
Removal of block indent (where every line is indented) from pasted RTF text
Tweaks for better HTML output (e.g. no more useless span tags containing only whitespace)
Accurate preview using XULRunner (via GeckoFX); no longer using the stupid .NET WebBrowser control
New logo/icon (I just really hated the old one I made and it was bugging the hell out of me)

Download here
(requires .NET Framework 2.0 or higher)

Rtf2Html screenshot

The app was designed around the goal of being able to quickly copy and paste snippets of code from Visual Studio (or Netbeans) and turning it into HTML that I could embed in these blog posts; this update stays true to that, and that’s why this app is still so sparse on features, such as conversion of font size or paragraph alignment attributes.

The block indentation removal that now occurs after text is pasted in may be a bit slow. Text in the RichTextBox is selected and altered within the text box itself (it’ll also freeze the UI – if you understand multithreading and WinForms, you know why it’s not simply a matter of spawning off a thread). The alternative is to deal with an RTF parser and edit the RTF input directly, but that’s way more work than I’d care to devote to this app at the moment.

geckofxHTMLrichtextboxrtfrtf2htmlwinformsxulrunner

jQuery toggle button

Jan 21 2011 · Web Design

On most of the mobile platforms you’ve probably seen a toggle, switch-style, button used as a replacement for a checkbox. I took a stab at doing something similar in HTML, CSS and Javascript.

You can see the final result here (it’s a pain in the ass to embed it)

Note that while I used jQuery, this is not a jQuery extension. It doesn’t use that much jQuery and I really don’t get the desire to make everything-and-the-kitchen-sink a jQuery plugin.

The button depends upon 2 images, a base, containing the design and both states of the button:

… and a frame (optional if you can get away with using CSS borders):

(note, the middle is transparent, not white)

The HTML and CSS consists of a:

A div, which has the its background-image set to the base and sized to the button’s inner area, roughly half the width (in this case, plus a few pixels as some pixels were shared by both states of the button) of the base and the same height
A block-level anchor element within the div, which has its background-image set to the frame and sized to the same area as the frame image. The anchor allows the area to be clickable and we’ll respond to the click event that occurs on this element.
An input checkbox which will store the checked/unchecked state of the button.

<div style="margin:0; padding:0; background:url(base.png) -41px 1px no-repeat transparent; width:46px; height:20px;">
    <a class="toggle-button" href="#" style="margin:0; padding:0; display:block; background:url(frame.png) 0 0 no-repeat transparent; width:48px; height:20px;">
        <input style="display:none;" type="checkbox" />
    </a>
</div>

The Javascript code to handle the click event, where the background is shifted left or right when the button’s state is toggled using jQuery’s animate function,

$('.toggle-button').click(function ()
{
    if (!$('input', this).is(':checked')) {
        $(this).parent().animate({ "background-position": "0px 1px" }, "slow");
        $('input', this).attr('checked', true);
    }
    else {

        $(this).parent().animate({ "background-position": "-41px 1px" }, "slow");
        $('input', this).attr('checked', false);
    }

    return false;
});

This all works great, but it’s not-so-great as a reusable component, so I encapsulated the code so that I could easily transform a div, such as the one shown below, into the toggle button.

<div id="my_toggle_button"></div>

Central to this is creating a ToggleButtonFactory, which will make the button by inserting the necessary HTML/CSS code into the DOM and bind the anchor to the click event. There’s also a ToggleButton object created by the factory which will have methods to toggle the button state (.toggle) and get the state of the button (.val).

function ToggleButton(_element, _funcSelectYes, _funcSelectNo)
{
    this.jqDomElement = _element;
    this.funcSelectYes = _funcSelectYes;
    this.funcSelectNo = _funcSelectNo;

    this.val = function ()
    {
        return $(this.jqDomElement).find('input').is(':checked');
    }

    this.toggle = function (funcSelectYes, funcSelectNo)
    {
        if (!this.jqDomElement.find('input').is(':checked')) {
            this.jqDomElement.animate({ "background-position": "0px 1px" }, "slow");
            this.jqDomElement.find('input').attr('checked', true);

            if (this.funcSelectYes) {
                this.funcSelectYes();
            }
        }
        else {

            this.jqDomElement.animate({ "background-position": "-41px 1px" }, "slow");
            this.jqDomElement.find('input').attr('checked', false);

            if (this.funcSelectNo) {
                this.funcSelectNo();
            }
        }
    }
}

var ToggleButtonFactory = {};
ToggleButtonFactory.makeButton = function (element, initialState, funcSelectYes, funcSelectNo)
{
    if ($(element).is('div')) {

        var elemId = $(element).attr('id');
        var newDivId = '__toggle_button_div_' + Math.ceil((Math.random() * 100000));
        $(element).replaceWith('<div id="' + newDivId + '" style="margin:0; padding:0; background:url(base.png) -41px 1px no-repeat transparent; width:46px; height:20px;"><a class="toggle-button" href="#" style="margin:0; padding:0; display:block; background:url(frame.png) 0 0 no-repeat transparent; width:48px; height:20px;"><input id="' + elemId + '" name="' + elemId + '" style="display:none;" type="checkbox" /></a></div>');

        var newElem = $('#' + newDivId);
        var tb = new ToggleButton(newElem, funcSelectYes, funcSelectNo);

        newElem.find('a').click(function ()
        {
            tb.toggle();
            return false;
        });

        if (initialState) {
            tb.toggle();
        }

        return tb;
    }
}

Note there’s some additional code here to deal with setting the button to an initial state and callbacks for when the button is set to the “Yes” or “No” state.

Now, to transform the my_toggle_button div shown above into a toggle button, the following is done:

var btn = ToggleButtonFactory.makeButton('#my_toggle_button', false, function () { }, function () { });

(the call can be shorter, this shows calling with all arguments and capturing the return value [the ToggleButton object])

For another take on this, see the jQuery LightSwitch plugin.

background-imagebuttoncheckboxCSSHTMLjavascriptjQueryswitchtoggle button

semi/signal

Posts Tagged ‘HTML’

Improving on strip_tags (part 2)

Whitespace and tags

The special case

Code

Testing

Limitations / future work

Improving on strip_tags

The Problem

The Solution

Limitations / future work

Rendering HTML to images with SVG foreignObject

Motivation

<foreignObject> to Image

Handling stylesheets

Handling external resources

More code

Other Approaches

Moving the caret to the end of text in an <input> element

HTML

Javascript

Manipulating text relative to the caret in a contenteditable div

Interaction classes – seperating CSS styles from Javascript interactions

autocomplete=”off”

progTools 1.7

Rtf2Html 1.22

jQuery toggle button

Projects

Tags

Contact

Feed

Archives