Posts Tagged ‘HTMLImage​Element’

Rendering HTML to images with SVG foreignObject

Motivation

For applications that allow users to create visual content, being able to generate images of their work can be important in a number of scenarios: preview/opengraph images, allowing users to display content elsewhere, etc. This popped up as a need for ScratchGraph and led me to research a few possible solutions. Using the SVG <foreignObject> element was one of the more interesting solutions I came across, as all rendering and image creation is done client-side.

<foreignObject> to Image

<foreignObject> is a somewhat strange element. Essentially, it allows you to load and render arbitrary HTML content within SVG. This in and of itself isn’t helpful for generating an image, but we can take advantage of two other aspects of modern browsers to make this a reality:

  • SVG markup can be dynamically loaded into an Image by transforming the markup into a data URL
  • Data URL length limits are no longer a concern. We no longer have the kilobyte-scale limits we were dealing with a few years ago

Sketching it out, the process looks something like this (contentHtml is a string with the HTML content we want to render):

The code for this is pretty straightforward:

// build SVG string
const svg = `
<svg xmlns='http://www.w3.org/2000/svg' width='
${width}' height='${height}'>
<foreignObject x='0' y='0' width='
${width}' height='${height}'>
${contentHtml}
</foreignObject>
</svg>`
;

// convert SVG to data-uri
const dataUri = `data:image/svg+xml;base64,${window.btoa(svg)}`;

Here I’m assuming contentHtml is valid and can be trusted. If that’s not the case, you’ll likely need some pre-processing steps before sticking it into a string like this.

The code above works, to a degree; there’s a few key limitations to be aware of:

  • Cross-origin images served without CORS headers won’t load within <foreignObject>
  • Styles declared via stylesheets do not pass through to the contents of <foreignObject>
  • External resources (images, fonts, etc.) won’t be in the generated Image, as the browser doesn’t wait for these resources to be loaded before rendering out the image

The cross-origin issue may be annoying and unexpected (as the browser does load these images), but it’s a valid security measure and CORS provides the mechanism around it.

Handling stylesheets and external resources are more important concerns, and addressing them allows for a much more robust process.

Handling stylesheets

This isn’t anything too fancy, here are the steps involved:

  • Copy all the style rules, from all the stylesheets, in the parent document
  • Wrap all those rules in a <style> tag
  • Prepend that string to the contentHtml string

The code for this precursor step looks something like this:

const styleSheets = document.styleSheets;
let cssStyles = "";
let urlsFoundInCss = [];

for (let i=0; i<styleSheets.length; i++) {
for(let j=0; j<styleSheets[i].cssRules.length; j++) {
const cssRuleStr = styleSheets[i].cssRules[j].cssText;
cssStyles += cssRuleStr;
}
}

const styleElem = document.createElement("style");
styleElem.innerHTML = cssStyles;
const styleElemString = new XMLSerializer().serializeToString(styleElem);

...

contentHtml = styleElemString + contentHtml;

...

Handling external resources

My solution here is somewhat curd, but it’s functional.

  • Find url values in the CSS code or src attribute values in the HTML code
  • Make XHR requests to get these resources
  • Encode the resources as Base64 and construct data URLs
  • Replace the original URLs (in the CSS url or HTML src) with the new base64 data URLs

The following shows how this is done for the HTML markup (the process is only slightly different for CSS).

const escapeRegExp = function(string) {
return string.replace(/[.*+?^${}()|[\]\\]/g, '\\$&'); // $& means the whole matched string
};

let urlsFoundInHtml = getImageUrlsFromFromHtml(contentHtml);
const fetchedResources = await getMultipleResourcesAsBase64(urlsFoundInHtml);
for(let i=0; i<fetchedResources.length; i++) {
const r = fetchedResources[i];
contentHtml = contentHtml.replace(
new RegExp(escapeRegExp(r.resourceUrl),"g"), r.resourceBase64);
}

The getImageUrlsFromFromHtml() and parseValue() methods that extract the value of src attributes from elements:

/**
*
*
@param {String} str
*
@param {Number} startIndex
*
@param {String} prefixToken
*
@param {String[]} suffixTokens
*
*
@returns {String|null}
*/
const parseValue = function(str, startIndex, prefixToken, suffixTokens) {
const idx = str.indexOf(prefixToken, startIndex);
if(idx === -1) {
return null;
}

let val = '';
for(let i=idx+prefixToken.length; i<str.length; i++) {
if(suffixTokens.indexOf(str[i]) !== -1) {
break;
}

val += str[i];
}

return {
"foundAtIndex": idx,
"value": val
}
};

/**
*
*
@param {String} str
*
@returns {String}
*/
const removeQuotes = function(str) {
return str.replace(/["']/g, "");
};

/**
*
*
@param {String} html
*
@returns {String[]}
*/
const getImageUrlsFromFromHtml = function(html) {
const urlsFound = [];
let searchStartIndex = 0;

while(true) {
const url = parseValue(html, searchStartIndex, 'src=', [' ', '>', '\t']);
if(url === null) {
break;
}

searchStartIndex = url.foundAtIndex + url.value.length;
urlsFound.push(removeQuotes(url.value));
}

return urlsFound;
};

The getMultipleResourcesAsBase64() and getResourceAsBase64() methods responsible for fetching resources:

/**
*
*
@param {String} url
*
@returns {Promise}
*/
const getResourceAsBase64 = function(url) {
return new Promise(function(resolve, reject) {
const xhr = new XMLHttpRequest();
xhr.open(
"GET", url);
xhr.responseType =
'blob';

xhr.onreadystatechange =
async function() {
if(xhr.readyState === 4 && xhr.status === 200) {
const resBase64 = await binaryStringToBase64(xhr.response);
resolve(
{
"resourceUrl": url,
"resourceBase64": resBase64
}
);
}
};

xhr.send(
null);
});
};

/**
*
*
@param {String[]} urls
*
@returns {Promise}
*/
const getMultipleResourcesAsBase64 = function(urls) {
const promises = [];
for(let i=0; i<urls.length; i++) {
promises.push( getResourceAsBase64(urls[i]) );
}
return Promise.all(promises);
};

More code

The code for this experiment is up on Github. Most functionality is encapsulated with the ForeignHtmlRenderer method, which contains the code shown in this post.

Other Approaches

  • Similar (same?) approach with dom-to-image
    This library also uses the <foreignObject> element and an approach similar to what I described in this post. I played around with it briefly and remember running to a few issues, but I didn’t keep the test code around and don’t remember what the errors were.
  • Server-side/headless rendering with puppeteer
    This seems to be the defacto solution and, honestly, it’s a pretty good solution. It’s not too difficult to get it up and running as a service, though there will be an infrastructure cost. Also, I’d be willing to bet this is what services like URL2PNG use on their backend.
  • Client-side rendering with html2canvas
    This is a really cool project that will actually parse the DOM tree + CSS and render the page (it’s a rendering engine done in client-side javascript). Unfortunately, only a subset of CSS is supported and SVG is not supported.

Elegant <img> presentation

While it’s easy to decry Flash and espouse the merits of HTML5, it’s worth taking a look at some of the positive elements found on many Flash sites and seeing what can be brought over to the HTML side. One very obvious aspect is the presentation of images. Flash sites almost never render an image in the way that an <img> tag is rendered by the browser. Flash sites will (typically) elegantly animate or fade an image in/out whereas the default browser behavior for HTML is to progressively render the image as it is received (sometimes images are 2D interlaced for a slightly better effect). Even worse, many HTML sites leave width and height as attributes to be determined dynamically for the <img> tag, resulting in page content shifting around as images are loaded.

Better image presentation is certainly possible with HTML + Javascript, and there are tons of beautiful JS image galleries on the web, but outside of such galleries most developers don’t attempt anything beyond the typical <img> presentation.

So, I did a little experiment with 2 goals in mind:

  • Elegant presentation of images with HTML + JS
  • Very simple markup, one iota above what’s necessary for a typical <img> tag

I’m pretty happy with the results.

elegant img loading

View the demo
(disable your browser cache so that loading latency is accurately taken into account)

Here’s what I did:

1. Make the <img> elements.

e.g.

<img
data-src="http://aautar.digital-radiation.com/elegant-img-loading/p1.jpg"
src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJAAAABGdBTUEAAK/INwWK6QAAABl0RVh0U29mdHdhcmUAQWRvYmUgSW1hZ2VSZWFkeXHJZTwAAAAQSURBVHjaYvj//z8DQIABAAj8Av7bok0WAAAAAElFTkSuQmCC"
width="200"
height="150"
alt="photo" />

Things to note:

  • The data-src attribute is an HTML 5 custom data attribute. It specifies the URL of the image to be loaded.
  • The src attribute uses the data URI scheme to load and set the initial image displayed in the img element. Leaving the src attribute blank is not desirable as it will collapse the img element and only show the alt tag (the same behavior occurs initially if a URL to an image is specified). Specifying a data URI allows the image to be loaded as part of the document itself. The data URI shown above is for a 1×1, transparent PNG. A GIF preloader would probably be a good idea here.
  • The width and height attributes are specified. This is not strictly necessary, but in keeping with the theme of elegance, this prevent other content on the page from shifting after images are loaded.

2. Write the Javascript code

<script src="jquery-1.6.2.min.js" type="text/javascript"></script>
<
script type="text/javascript">

window.onload = function ()
{
$(
'img').each(function ()
{
var imgElem = $(this);
var src = imgElem.attr('data-src');

var img = new Image();
img.src = src;
img.onload =
function ()
{

imgElem.fadeTo(
'fast', 0.0001, function ()
{
imgElem.attr(
'src', src);
imgElem.fadeTo(
'slow', 1)
});

}

});
}

</script>

For simplicity, jQuery is used.

After the page is loaded, the data-src attribute of every <img> element is read. The image, specified by the data-src attribute, is loaded via a Javascript Image object and, once loaded (Image.onload event fires):

  • The <img> element is faded out (actually faded to an alpha of 0.0001 because a total fadeOut to 0 will effectively remove the element from the document rendering).
  • The src attribute is changed to the the URL of the loaded image.
  • The <img> element is faded in.

The important bit is changing the src attribute; the effect itself, whether it’s fading, sliding, etc. is not terribly important.