Tom Lauck’s Deseloper.org

RIA Myth Busting: Back Button, History, and SEO

author:

I have touched on the topic of SEO in the past, however, that article the focused on a broad range of ideas to improve organic search.  I wanted to focus in on realities surrounding RIAs and commonly requested features, specifically to the combination of back button support, bookmarking, and search engine optimization.  All of these features would be wonderful in a single page interface, and with current technology and methodology, is it possible to have our cake and eat it too?

It is important to first have an understanding of the difference in objectives between a web application and a web site.  (Technically speaking, most web sites are web applications, but we are using the term in a looser sense).  Typical examples of an application in this scope would be GMail, Google Reader, and the user section of Mint.com – sites with almost no need of searchable content.  Examples from the opposite end of the spectrum would be Bloomberg, A List Apart, Wired, and someone’s blog.  Therefore, we have two camps, one where the sole focus is on interaction with data and no search strategy is needed and the other dictates a solid search strategy.

A method that has been gaining steam in the RIA world is using a hash sign (#), or anchor, in the URL.  Many talented people have spent precious time creating solutions to history and back button support for AJAX and Flash applications.  This is fantastic for a web application, because it provides capability for standard user interactions supported in browsers that are typically unsupported in rich internet applications.

So now that there is history support, does that mean SEO has been fully considered?  An article I found on w3.org sheds some light on the subject using a CNN video player as a case study:

CNN uses links like the above for all the topical video segments that are published on its site. The URL in this case has the following components:

Component Value
Protocol http
Host www.cnn.com
Path video
Client Param #/video/tech/2008/02/19/vo.aus.sea.spider.ap

2.1.1 Things To Note

The browser is expected to do a GET of the URL leading up to the fragment, and the processing application, in this case, the JavaScript embedded in the HTML Response processes the portion of the URL following the #.
The fragment identifier has been intentionally identified as a client parameter.
Treating it as a regular fragment identifier in this usage would result in one incorrectly infering that the URL for the video resource being addressed is http://www.cnn.com/video.
This would result in all the video links on the CNN site getting the same URL.
Thus, the entire URL in this case is http://www.cnn.com/video/#/video/tech/2008/02/19/vo.aus.sea.spider.ap
A consumer of this URL who goes looking for an idwithin the Response that matches the #-suffix of this URL will fail.
The reported Content-Type for the resource is text/html. However the behavior of the #-suffix in this case is not defined by the HTML specification.
As used, the #-suffix is a first-class client parameter in that it gets consumed by a script that is served as part of the HTML document returned by the server upon receiving a GET request.
This embedded script examines the URL available to it as script variable content.location, strips off the # and uses the rest of the prefix as an argument to function that generates the actual URL.
Having constructed this content URL, the script then proceeds to instruct the browser to play the media at the newly constructed location.

Notice that “the browser is expected to do a GET of the URL leading up to the fragment…JavaScript embedded in the HTML Response processes the portion of the URL following the #.”  To paraphrase, Google does not look at client side interactions, the fragment is truncated from Google’s index.  From this several assumptions can be made:

  1. Any back link using http://example.com/#example is actually viewed as http://example.com
  2. Back links pointing to URL fragments will have no individual page rank.
  3. In content rich scenarios, the use of URL fragments in leu of separate pages effectively dilutes almost all search traction.

Reflecting on how Google treats URL fragments, it can be clearly seen that a single page interface is not an effective strategy in scenarios with rich content.  Another big myth around single page interfaces is in the use of Flash, SWFAddress and/or Flex’s history manager.  Google will disregard URL fragments, the very foundation of SWFAddress and Flex’s history manager.  To reiterate, Googlebot just disregarded the URLs you have just crafted with SWFAddress.  It should be stated that some individuals wholeheartedly believe that using URL fragments is a successful SEO strategy.  Yet, when Google is typically the number one returning visitor, do you really want to take a chance at questioning the very foundation Google uses to spider your site?

Take for instance a designer’s personal site with tabs for Home, Resume, Portfolio, and Contact.  Would the designer want to implement a single page interface?  The answer would likely be no.  The content would gain more traction if it is separated properly.  To rehash the example of Google Reader, a single page interface is a good choice, for Google Reader would not benefit from having a separate page for each feed a user is subscribed to.

The advances made in RIA with regard to history and back button support encapsulate the innovative spirit that the web has embraced.  However, web workers tend to jump on bandwagons and this filters down to individuals with the power to poorly implement a technology – remember Flash intro pages?  Much like a seasoned web worker becomes very business and client savvy after years in the field, we need to be Google savvy.  Threfore, next time a wirefrime for a single page interface lands your desk, does the content dictate a search strategy?  If so, do some research on the reality of the solutions and ultimateley be kind to Google and Google will be kind to you.

Oct 22 2008