Archive for June, 2012

Ellipsizer

A significant number of projects I’ve worked on have, at some point, involved truncating text and tacking on an ellipsis. The problem is, it’s difficult to do it precisely, as doing so requires exact metrics for fonts used and exact dimensions of whatever area the text is contained within; information usually not readily available. So it’s usually necessary to do some sort of approximation of the maximum number of characters that can fit within the container. However, this method only works well for fixed-width fonts; with variable-width fonts the result is a significant error/over-estimate (so the string is cut off well before it needs to be), as the width of characters can be significantly different (e.g. “i” vs “m”).

One idea I had to reduce this error was to specify a relative width (i.e. weight) for certain characters. In general, most characters would have a width of 1.0, but whitespace, period, comma, lowercase ‘i’, semicolon, etc. would have a width less than 1.0. With these widths, a length function can be defined which is a sum of the character widths, and the result is a value that indicates the length of the string relative not only to the number of characters rendered but also to the width of the characters rendered. A maximum threshold is still needed, but it no longer specified an absolute maximum number of characters, instead it’s the maximum number of full-width (width = 1.0) characters that can be rendered in the given area.

Of course, someone else had the same idea, but I didn’t like the code presented – the String.replaceAll() call every iteration made me cringe. Even with immutable strings, this is something that can be done in O(n) time as long as you can reference individual characters and grab a sub-string. However, one issue presented in the post that I didn’t think about was that strings should only be truncated at certain points (a space, a hyphen) and not simply chop off words in the middle.

My solution is in Javascript, but the operations should carry over easily to any language.

Ellipsizer class

Ellipsizer = function(_maxLength) {

    
this.maxLength = _maxLength;        

    
this.getLetterWidth = function(letter)
    {                
        
if(letter == '.' || letter == ' ' || letter == ',' || letter == '|' || letter == '\'' || letter == ':' || letter == ';' || letter == '!')
            
return 0.25;
    
        
if(letter == 'j' || letter == 'l' || letter == 'i' || letter == '^' || letter == '(' || letter == ')' || letter == '[' || letter == ']' || letter == '"')
            
return 0.5;
    
        
return 1;
    }
    
    
this.isCharCutpoint = function(ch)
    {
        
if(ch == ' ' || ch == '-' || ch == '.' )
            
return true;
            
        
return false;
    }
    
    
this.cut = function(str)
    {
    
        
var strWeightedLength = 0;
        
var lastCutpoint = 0; // point of whitespace, hyphen, etc. (point where we can cut string)
        
var curSubStr = '';
        
for(var i=0; i<str.length; i++)
        {
            
var letter = str.charAt(i);
            strWeightedLength +=
this.getLetterWidth(letter);
            
            
if(this.isCharCutpoint(letter))
            {
                lastCutpoint = i;
            }
            
            
if(strWeightedLength >= this.maxLength)
            {
                curSubStr = str.substr(0, lastCutpoint);
                
break;
            }
        }
    
        
var result = curSubStr + "…";
        
return result;
    }
    
}

Example

$(document).ready(function() {

    $(
'#ellipsize').click(function() {
    
        $(
'#text-to-cut p').each(function() {
            
var str = $(this).text();
            
var ellipsizer = new Ellipsizer(64);                    
            $(
this).text( ellipsizer.cut(str) );                        
        });
        
        
return false;
    });                
});

Maximum length of 64 to provide for the following constraints:

Ellipsizer

Ellipsizer

Live Demo

There’s quite a bit that can be improved here, mainly in recognizing more characters as quarter- or half-width characters, or having more classes of widths (0.3, 0.75, etc.) and assigning characters to them. However, for a first-pass, this seems to work pretty well and it’s important to remember that this is still a pretty rough approximation.