Lately I've been looking at data. Not always so much data in itself but how data is presented. We are presented with far more information on a daily basis than we can possibly store or comfortably process. Much of it gets overlooked or tossed aside. There is no place where this is more prominent than on the Web. I can't think of how many times I've dismissed a site during a search simply because it was too much trouble find what I was looking for. We don't have time to sift through overbearing gridlines, confusing navigation, lack of whitespace and pages of thousands of links, images and special offers, each of which appear to be the most important thing on the page.
In Envisioning Information Edward Tufte spends a considerable amount of time discussing ways in which various charts, graphs, maps, timetables and illustrations have made attempts at "escaping flatland" &emdash; representing the complex data of our four-dimentional world on a two-dimentional surface. The more effective examples tell their story in an intuitive and sometimes profound way. They not only make the mundane beautiful, they make complex data sets easier to understand.
Consider the Tag Cloud
Actually, I'd prefer that people would forget tag clouds. I'm tired of tag clouds. Granted, a tag cloud is intended to digest data into a more immediately understandable form, but in most cases it comes out as the sort of thing that makes typographers cry (fig 1). They take a potentially useful tool for navigation and searching, and mangle it into a bowl of alphabet soup through which you must sift in order find what you are really looking for.
Here's an exercise in which alternative ways of looking at tags are considered. Maybe it will inspire something better than tag clouds. The data is a list of tags for an archive of fictional articles.
First I'd like to point out that this is primarily an exercise. I'm going to be examining details about a set of data that visitors to a Web site may or may not have interest in. The point is to illustrate how representation of data can highlight different characteristics of that data, and to strive for immediacy of understanding of the data.
It does seem to be important to many site designers to convey how many things have been tagged by a given tag, and how each tag's frequency relates to that of others. Lets back up a bit and start with the common default way to do this which is to attach numbers to the list of tags (fig. 2). In this example tags are listed in the order they were created.
This is a bit more precise information than a tag cloud presents but it isn't as immediate. There is no apparent order. It is clear that there are more things tagged with Firefox than with Mongrel but you have to work a little (yes, very little) to get that information. It takes a bit longer to figure out which is the greatest and which is the least. Consider how long it would take to figure out which tag is closest to the average.
No, you aren't likely to need to know which tag is closest to the average, but this illustrates the point that the information is not readily consumed. It is like a raw slab of meat. If you ate it now it wouldn't taste very good and it could make you very sick.
Order out of Chaos
The fact that these tags are listed in the order they were created is lost on us. We have no way of knowing that. That may not be the most useful piece of information so lets set it aside for a while and highlight something else.
Here the tags adhere to a clear hierarchy but it still takes a moment to digest. Consider how long it takes to figure out which pair of adjacent tags has the greatest disparity in frequency between them.
The problem with numbers is that they are abstract. It is easy to picture 5 apples (I see 5 apples arranged like the dots on six-sided die), but it is more difficult to picture 29 apples. It is less concrete. This is where what Tufte refers to as "small multiples" comes in handy. (1)
Instead of displaying numbers we're now showing actual taggings. Each dot represents a tag on an article. Small multiples provide a reading of the data on a small scale (one dot to the next), and come together as a whole to tell a story on a larger scale.
The list is beginning to behave like a bar graph, showing relative frequency from one tag to the next, but it is better than that. A bar graph can show relationships in value between multiple things, but without numbers those relationships are vague. Their scale is relative. Here the relationships are more explicit, and you don't have to read anything or do any math. Cognition is immediate. While it would be tedious to count each dot in order to know the exact number of something, you do get a general sense of what's going on.
The dots have freed us to make another improvement. This list isn't very long, but it could be. Searching for a specific topic in that list could become tedious if it is not alphabetized. We will make that change and a few others.
Each article can have multiple tags. So clicking on Firefox will bring up a list if articles, some which may be tagged with other things. It would be nice to show that relationship. Our next iteration (fig.5) highlights all relevant taggings. It would also be nice to see which tag is currently selected. This one is interactive.
fig. 5 (interactive)
Clicking on Firefox causes all dots on its bar to turn orange as well as any dots (taggings) that are associated with articles that have been tagged with Firefox. This tells us (very quickly) that of the articles tagged with Firefox, three are also tagged with Thunderbird., none are tagged with Python or Leopard, etc.
The animation is arguably a bit redundant, but I like the way it looks.
Have we not learned all there is to learn about these humble tags? Never! These tags still have dark little secrets yet to be gleaned. We are working with two (physical) dimensions but we can impose another on flatland-time.
fig. 6 (interactive)
If we stretch data out along a timeline we can learn more about it. I've added some subtle striping and elongated the dots to emphasize concurrent taggings. We can see that early on in the life of this site Duck Typing was a hot topic but its popularity gradually dwindled, Thunderbird had a short but decent run some time ago, and Python has been making itself known of late for some mysterious reason. Our mundane navigation buttons are starting to take on a life of their own.
All anthropomorphism aside, this arrangement is giving us a lot of information very quickly, and without any numbers. But it comes at a cost. At this point we've nearly tripled the area these tags take up, turning turning our lowly navigation into an ever increasingly space hungry monster. We've also understated the frequency relationship between tags a bit. Perhaps we've gone too far but the exercise is not in vain.
Do Good to Data. Do it Well.
In clarifying data, the goal is not simply to simplify. There is only so much fat that can be cut before data becomes further obfuscated. Sometimes, to tell the story well, we need to reveal more information, but it should be done in a way that does not complicate the reading. It should make clear what was once vague.
I highly recommend the following resources for a more detailed discussion on information design.
- Ben Fry
- Edward Tufte
- Accessible Data Visualization with Web Standards, A List Apart
- Bret Victor, Magic Ink: Information Software and the Graphical Interface
- Arguably these are not small multiples as Edward Tufte describes them because these boxes are all the same, but I contend that each box is unique compared to its neighbors because they each represent a unique relationship to an article.