Seth_blog_image

Lately I've been looking at data. Not always so much data in itself but how data is presented. We are presented with far more information on a daily basis than we can possibly store or comfortably process. Much of it gets overlooked or tossed aside. There is no place where this is more prominent than on the Web. I can't think of how many times I've dismissed a site during a search simply because it was too much trouble find what I was looking for. We don't have time to sift through overbearing gridlines, confusing navigation, lack of whitespace and pages of thousands of links, images and special offers, each of which appear to be the most important thing on the page.

In Envisioning Information Edward Tufte spends a considerable amount of time discussing ways in which various charts, graphs, maps, timetables and illustrations have made attempts at "escaping flatland" &emdash; representing the complex data of our four-dimentional world on a two-dimentional surface. The more effective examples tell their story in an intuitive and sometimes profound way. They not only make the mundane beautiful, they make complex data sets easier to understand.

Consider the Tag Cloud

Actually, I'd prefer that people would forget tag clouds. I'm tired of tag clouds. Granted, a tag cloud is intended to digest data into a more immediately understandable form, but in most cases it comes out as the sort of thing that makes typographers cry (fig 1). They take a potentially useful tool for navigation and searching, and mangle it into a bowl of alphabet soup through which you must sift in order find what you are really looking for.

fig. 1

Here's an exercise in which alternative ways of looking at tags are considered. Maybe it will inspire something better than tag clouds. The data is a list of tags for an archive of fictional articles.

First I'd like to point out that this is primarily an exercise. I'm going to be examining details about a set of data that visitors to a Web site may or may not have interest in. The point is to illustrate how representation of data can highlight different characteristics of that data, and to strive for immediacy of understanding of the data.

Good Digestion

It does seem to be important to many site designers to convey how many things have been tagged by a given tag, and how each tag's frequency relates to that of others. Lets back up a bit and start with the common default way to do this which is to attach numbers to the list of tags (fig. 2). In this example tags are listed in the order they were created.

fig. 2

This is a bit more precise information than a tag cloud presents but it isn't as immediate. There is no apparent order. It is clear that there are more things tagged with Firefox than with Mongrel but you have to work a little (yes, very little) to get that information. It takes a bit longer to figure out which is the greatest and which is the least. Consider how long it would take to figure out which tag is closest to the average.

No, you aren't likely to need to know which tag is closest to the average, but this illustrates the point that the information is not readily consumed. It is like a raw slab of meat. If you ate it now it wouldn't taste very good and it could make you very sick.

Order out of Chaos

The fact that these tags are listed in the order they were created is lost on us. We have no way of knowing that. That may not be the most useful piece of information so lets set it aside for a while and highlight something else.

fig. 3

Here the tags adhere to a clear hierarchy but it still takes a moment to digest. Consider how long it takes to figure out which pair of adjacent tags has the greatest disparity in frequency between them.

Go Long

The problem with numbers is that they are abstract. It is easy to picture 5 apples (I see 5 apples arranged like the dots on six-sided die), but it is more difficult to picture 29 apples. It is less concrete. This is where what Tufte refers to as "small multiples" comes in handy. (1)

fig. 4

Instead of displaying numbers we're now showing actual taggings. Each dot represents a tag on an article. Small multiples provide a reading of the data on a small scale (one dot to the next), and come together as a whole to tell a story on a larger scale.

The list is beginning to behave like a bar graph, showing relative frequency from one tag to the next, but it is better than that. A bar graph can show relationships in value between multiple things, but without numbers those relationships are vague. Their scale is relative. Here the relationships are more explicit, and you don't have to read anything or do any math. Cognition is immediate. While it would be tedious to count each dot in order to know the exact number of something, you do get a general sense of what's going on.

Keep Going

The dots have freed us to make another improvement. This list isn't very long, but it could be. Searching for a specific topic in that list could become tedious if it is not alphabetized. We will make that change and a few others.

Each article can have multiple tags. So clicking on Firefox will bring up a list if articles, some which may be tagged with other things. It would be nice to show that relationship. Our next iteration (fig.5) highlights all relevant taggings. It would also be nice to see which tag is currently selected. This one is interactive.

fig. 5 (interactive)

Clicking on Firefox causes all dots on its bar to turn orange as well as any dots (taggings) that are associated with articles that have been tagged with Firefox. This tells us (very quickly) that of the articles tagged with Firefox, three are also tagged with Thunderbird., none are tagged with Python or Leopard, etc.

The animation is arguably a bit redundant, but I like the way it looks.

Keep Going!

Have we not learned all there is to learn about these humble tags? Never! These tags still have dark little secrets yet to be gleaned. We are working with two (physical) dimensions but we can impose another on flatland-time.

fig. 6 (interactive)

If we stretch data out along a timeline we can learn more about it. I've added some subtle striping and elongated the dots to emphasize concurrent taggings. We can see that early on in the life of this site Duck Typing was a hot topic but its popularity gradually dwindled, Thunderbird had a short but decent run some time ago, and Python has been making itself known of late for some mysterious reason. Our mundane navigation buttons are starting to take on a life of their own.

All anthropomorphism aside, this arrangement is giving us a lot of information very quickly, and without any numbers. But it comes at a cost. At this point we've nearly tripled the area these tags take up, turning turning our lowly navigation into an ever increasingly space hungry monster. We've also understated the frequency relationship between tags a bit. Perhaps we've gone too far but the exercise is not in vain.

Do Good to Data. Do it Well.

In clarifying data, the goal is not simply to simplify. There is only so much fat that can be cut before data becomes further obfuscated. Sometimes, to tell the story well, we need to reveal more information, but it should be done in a way that does not complicate the reading. It should make clear what was once vague.

Further Reading

I highly recommend the following resources for a more detailed discussion on information design.

Footnotes

  1. Arguably these are not small multiples as Edward Tufte describes them because these boxes are all the same, but I contend that each box is unique compared to its neighbors because they each represent a unique relationship to an article.



Black-carriage

From the Browser to the Desktop

As websites graduate to version 2.0 with their dynamic user-generated data-driven content we are adjusting our vocabulary to talk about them. We are no longer just building sites, we are launching applications. Indeed many applications from the desktop realm are being replicated online. Google Documents, for example, could hypothetically replace the need for something like Microsoft Office, but with the added benefit of everything being a part of the cloud so that a document may be accessed and edited by multiple users from multiple locations, independent of each users' operating system.

This isn't a one way street, desktop applications do not all live in the 'desktop bubble'. At startup many applications check for updates and external information, like iTunes downloaded track information, album artwork and so on.

Lately the line between the internet and the hard drive has become a bit blurred. Several technologies are in development that allow web developers to create what appear to be full fledged desktop applications using HTML, CSS, Javascript, Actionscript — the Web oriented tools we already know. An overly simplified description of this is a browserless Web application that doesn't necessarily use the Web. Google Gears, Mozilla's Prism and Adobe AIR are a few of the efforts put forward recently to move the web onto the desktop to varying degrees.

I recently had the opportunity to try out one of these "horseless carriages". What follows is an overview of my experience with the Adobe Integrated Runtime (AIR), formerly known as Apollo.

The Pieces

There are two pieces to any AIR application. First there's the runtime environment which handles HTML rendering and all of the difficult behind-the-scenes stuff that us web developers would rather not worry about. The second piece is the actual application composed of your regular Web technologies (HTML, Javascript, PDF's, video, Flash, etc).

When a user installs an AIR application for the first time they must install the runtime environment as well. Later, when the user installs another AIR app, they will not have to download the already installed runtime. The whole download and installation process is designed to be "seamless", appearing as if the user is downloading just one item regardless of whether they need to download AIR or not. As I'll explain later, however, it is not always so smooth.

Benefits

So why build an AIR application?

Why put a Web application on your desktop? One of the advantages that all of the aforementioned tools cite is the ability to work offline. Granted, reliable internet access is becoming less and less of an issue, but what good is any Web application when you don't have a working internet connection? Working offline usually means storing information in a database and synching that information with a master database when a connection becomes available later. AIR comes packaged with SQLite for this purpose. Alternatively data can be stored as XML and can even be encrypted. Furthermore, if it is not necessary to store some information online, storing it locally can save some time because data does not need to be sent back and forth. This all makes for a smoother, persistent user experience.

Consistancy

AIR uses the WebKit HTML rendering engine which recently passed the Acid3 test for Web standards, so you can rest assured that your floats will properly float and your padding will properly pad. No need to hack together special code for you-know-who.

OS integration

AIR provides a windowing API that allows you to interact with the operating system's windows. You can set the size and position of windows, restrict them to a range of sizes, dictate whether they can be resized or minimized, display your app fullscreen, or run the app with no window at all. You can even create your own custom window GUI. One of the advantages of having this kind of control is that browser windows often have elements (back/forward and refresh buttons, search bar, etc) that are not necessarily relevant to every application. Sometimes less is more. AIR also allows for other kinds of integration with the desktop including access to the file system, clipboard data, drag and drop events and so on.

The Seams

While all of this is good and exciting and lets web developers pretend that we can build desktop applications, AIR's system is not perfect.

AIR detection

As I mentioned before, any AIR app is composed of two pieces; the runtime environment and implementation code. If the user already has AIR installed it doesn't make much sense to install it again. Adobe provides a Flash movie that they call a "badge" that functions as a button for downloading the application. More importantly, it also performs a bit of magic to determine whether the user has AIR installed already or not. If they don't, it downloads both AIR and your app and installs them together all in one smooth and easy action.

This is all fine and well except for the fact that the badge requires the latest version of Flash in order to do this. A bit of plugin version detection can render an alternative page from which the user can download first AIR and then your app. This isn't so bad except for:

  1. If they don't install AIR first followed by your app second, the installation of your app will fail.
  2. Users often don't know what they have installed on their computer and may try to install AIR unnecessarily.
  3. Explaining to your users why they need to download and install item 'A' first in order to install item 'B' may lead to a bit of required reading and confusion. Such messages are in danger of being ignored as a bunch of legal mumbo jumbo or simply as a waste of time.

Granted, the users who will be tripped up by such things are probably a small minority of users lacking in computer savviness and patience, but you should still be aware of who your audience is. If they are likely to have trouble with the two-piece download then chances are good that their version of Flash Player is not even current enough to do the AIR detection, thus all but defeating the purpose of the "seamless" download feature.

Security Certificates

Because AIR apps can be given access to the user's system and transfer data over the internet there is an obvious security concern. Adobe requires that AIR apps be packaged with a security certificate (hopefully) verifying the authenticity of the application. This is designed to prevent someone else from maliciously altering your app and redistributing it, getting you into a heap of trouble.

You certainly don't want to get into trouble because someone posing as you has done something nasty to your users, but preventing this isn't ideal either. The solution is to purchase a certificate from a certificate authority such as Thawte or Verisign for around $300 a year, but some developers or clients may consider this to be an unnecessary expense. This doesn't actually prevent anyone from creating a fraudulent version of your app, it just proves that you are the one who packaged your app.

The alternative is to create a self-signed certificate, and Adobe provides some tools for easily doing this. The only problem with self-signing your certificate is that when the user installs your app they are presented with this window:

The language here is intentionally ominous. You are most likely creating something innocent, but your users may be scared off by the idea of giving someone of unknown identity unrestricted access to their computer. I don't know that there is a better solution to the security issue but it is something to consider.

Perhaps I'm griping about things that are inevitable and/or insignificant, but there's my two cents. I do think AIR is worth a try, at least to impress your friends with your mad desktop app building skills.



Xml

So you have this great community-based website. People are making profiles, uploading images and videos, voting on who's got the hottest dress and 'snagging' each other's cakes. What you need now is an interactive map that not only shows you where all of the parties are and how many people are going to be there, but also links you to the host's profile page so you can check out the car daddy is getting them for their birthday. OMG, you need XML!

Okay, so no one is likely to be squealing over the code behind the party map that Killswitch recently built for one of our entertainment clients—this kind of dynamic user data-driven application isn't even all that new. But I have to admit that I got a little excited over this and other similar projects that we've been doing at Killswitch thanks to the combination of ActionScript 3 and Ruby on Rails and how they handle XML.

These projects usually follow the pattern of users upload some sort of content (party info, images etc. ), data about that stuff gets stored in a database (city, name, longitude, latitude, file paths), then it's packaged up as XML and passed on to the flash movie (map) where that information is dressed up in its party outfit.

The benefit of using Rails on the back end is that it makes accessing the database and generating xml an easy, elegant and painless process. It can be as simple as adding one line of code:

render :xml => @parties.to_xml

Which parties this refers to is of course determined by the find conditions you set when the @parties object is created. If more control is needed—maybe there are fields in the database we don't want to make public or there is some sort of reformatting we want to impose on the data—we can render a template instead. This is done similarly to an html template.

ActionScript 3 is beneficial for the same reason that working with XML in Rails is convenient. It is nearly automatic. In previous versions of ActionScript if I wanted to get the name of the second guest attending a party as represented by this excerpt of XML:

<party>
  <host>Seth</host>
  <location>Chicago</location>
  <guests>
    <guest>Jasper Johns</guest>
    <guest>Julie Mehretu</guest>
    <guest>Robet Rauschenberg</guest>
  </guests>
  <time>9pm sharp</time>
</party>

I would have to write something like this:

//assuming the xml is loaded into partyXML
var second_guests = partyXML.firstChild.childNodes[2].childNodes[1].firstChild;

This is not exactly very legible and becomes less so as we go deeper down the xml tree. What's worse is what happens when someone decides to change the structure of the tree. If time were listed before guests, we'd be looking for a list of names somewhere inside "9pm" and that wouldn't be conducive to anything good. The assignment to second_guest would have to be rewritten.

One alternative would be to write a function that parses the xml and formats it into an object that can be easily referenced later, but I'd rather spend my time animating something. After all, that is what flash is good for. This is where ActionScript 3 wins some praise. These days all it takes to find the second guest coming to my party is:

var second_guest = partyXML.guests.children()[1]
// returns "Julie Meretu"

Finding the time would be:

var start_time = partyXML.time
// returns "9pm sharp"

It is intuitive, legible and clean. I don't need to know the order in which things are listed in the xml. I only need to know how they are nested and what the tag names are. Not only does this save the headaches of maintaining cryptic code when the xml structure is changed, but it also saves a lot of time—time better spent on perfecting shape tweens or tricking out the user interface.

So there it is, your dynamic XML birthday cake fresh from the Rails bakery, ornately decorated with a layer of Flash-icing. Everybody nosh.




RSS Feed


CATEGORIES


ARCHIVES


BOOKMARKED


Add to Technorati Favorites