Lately I've been looking at data. Not always so much data in itself but how data is presented. We are presented with far more information on a daily basis than we can possibly store or comfortably process. Much of it gets overlooked or tossed aside. There is no place where this is more prominent than on the Web. I can't think of how many times I've dismissed a site during a search simply because it was too much trouble find what I was looking for. We don't have time to sift through overbearing gridlines, confusing navigation, lack of whitespace and pages of thousands of links, images and special offers, each of which appear to be the most important thing on the page.
In Envisioning Information Edward Tufte spends a considerable amount of time discussing ways in which various charts, graphs, maps, timetables and illustrations have made attempts at "escaping flatland" &emdash; representing the complex data of our four-dimentional world on a two-dimentional surface. The more effective examples tell their story in an intuitive and sometimes profound way. They not only make the mundane beautiful, they make complex data sets easier to understand.
Consider the Tag Cloud
Actually, I'd prefer that people would forget tag clouds. I'm tired of tag clouds. Granted, a tag cloud is intended to digest data into a more immediately understandable form, but in most cases it comes out as the sort of thing that makes typographers cry (fig 1). They take a potentially useful tool for navigation and searching, and mangle it into a bowl of alphabet soup through which you must sift in order find what you are really looking for.

fig. 1
Here's an exercise in which alternative ways of looking at tags are considered. Maybe it will inspire something better than tag clouds. The data is a list of tags for an archive of fictional articles.
First I'd like to point out that this is primarily an exercise. I'm going to be examining details about a set of data that visitors to a Web site may or may not have interest in. The point is to illustrate how representation of data can highlight different characteristics of that data, and to strive for immediacy of understanding of the data.
Good Digestion
It does seem to be important to many site designers to convey how many things have been tagged by a given tag, and how each tag's frequency relates to that of others. Lets back up a bit and start with the common default way to do this which is to attach numbers to the list of tags (fig. 2). In this example tags are listed in the order they were created.

fig. 2
This is a bit more precise information than a tag cloud presents but it isn't as immediate. There is no apparent order. It is clear that there are more things tagged with Firefox than with Mongrel but you have to work a little (yes, very little) to get that information. It takes a bit longer to figure out which is the greatest and which is the least. Consider how long it would take to figure out which tag is closest to the average.
No, you aren't likely to need to know which tag is closest to the average, but this illustrates the point that the information is not readily consumed. It is like a raw slab of meat. If you ate it now it wouldn't taste very good and it could make you very sick.
Order out of Chaos
The fact that these tags are listed in the order they were created is lost on us. We have no way of knowing that. That may not be the most useful piece of information so lets set it aside for a while and highlight something else.

fig. 3
Here the tags adhere to a clear hierarchy but it still takes a moment to digest. Consider how long it takes to figure out which pair of adjacent tags has the greatest disparity in frequency between them.
Go Long
The problem with numbers is that they are abstract. It is easy to picture 5 apples (I see 5 apples arranged like the dots on six-sided die), but it is more difficult to picture 29 apples. It is less concrete. This is where what Tufte refers to as "small multiples" comes in handy. (1)

fig. 4
Instead of displaying numbers we're now showing actual taggings. Each dot represents a tag on an article. Small multiples provide a reading of the data on a small scale (one dot to the next), and come together as a whole to tell a story on a larger scale.
The list is beginning to behave like a bar graph, showing relative frequency from one tag to the next, but it is better than that. A bar graph can show relationships in value between multiple things, but without numbers those relationships are vague. Their scale is relative. Here the relationships are more explicit, and you don't have to read anything or do any math. Cognition is immediate. While it would be tedious to count each dot in order to know the exact number of something, you do get a general sense of what's going on.
Keep Going
The dots have freed us to make another improvement. This list isn't very long, but it could be. Searching for a specific topic in that list could become tedious if it is not alphabetized. We will make that change and a few others.
Each article can have multiple tags. So clicking on Firefox will bring up a list if articles, some which may be tagged with other things. It would be nice to show that relationship. Our next iteration (fig.5) highlights all relevant taggings. It would also be nice to see which tag is currently selected. This one is interactive.
fig. 5 (interactive)
Clicking on Firefox causes all dots on its bar to turn orange as well as any dots (taggings) that are associated with articles that have been tagged with Firefox. This tells us (very quickly) that of the articles tagged with Firefox, three are also tagged with Thunderbird., none are tagged with Python or Leopard, etc.
The animation is arguably a bit redundant, but I like the way it looks.
Keep Going!
Have we not learned all there is to learn about these humble tags? Never! These tags still have dark little secrets yet to be gleaned. We are working with two (physical) dimensions but we can impose another on flatland-time.
fig. 6 (interactive)
If we stretch data out along a timeline we can learn more about it. I've added some subtle striping and elongated the dots to emphasize concurrent taggings. We can see that early on in the life of this site Duck Typing was a hot topic but its popularity gradually dwindled, Thunderbird had a short but decent run some time ago, and Python has been making itself known of late for some mysterious reason. Our mundane navigation buttons are starting to take on a life of their own.
All anthropomorphism aside, this arrangement is giving us a lot of information very quickly, and without any numbers. But it comes at a cost. At this point we've nearly tripled the area these tags take up, turning turning our lowly navigation into an ever increasingly space hungry monster. We've also understated the frequency relationship between tags a bit. Perhaps we've gone too far but the exercise is not in vain.
Do Good to Data. Do it Well.
In clarifying data, the goal is not simply to simplify. There is only so much fat that can be cut before data becomes further obfuscated. Sometimes, to tell the story well, we need to reveal more information, but it should be done in a way that does not complicate the reading. It should make clear what was once vague.
Further Reading
I highly recommend the following resources for a more detailed discussion on information design.
- Ben Fry
- Edward Tufte
- Accessible Data Visualization with Web Standards, A List Apart
- Bret Victor, Magic Ink: Information Software and the Graphical Interface
Footnotes
- Arguably these are not small multiples as Edward Tufte describes them because these boxes are all the same, but I contend that each box is unique compared to its neighbors because they each represent a unique relationship to an article.
Testing with Selenium
Anyone who has worked on Javascript eccentric web applications knows how much of a hassle it can be. Either you're stuck manually testing endless possible combinations of actions, or you're writing them for your Selenium plugin. Things get even worse when your client wants to support browsers like IE6. RSpec has encapsulated the behavior driven development of models, controllers and views, and user stories have integrated testing between the layers, but unfortunately neither has done much in the development of interactive web applications.
Working on Javascript applications after learning RSpec can be a painful experience. Your fingers want to write simple tests, whereas Javascript wants to you painfully point and click until everything works together. For years I have been nagging myself to find a headless browser that could be integrated into my development environment, but I simply never had the time, energy or justification to take action. Tired of pointing and clicking repeatedly after every change, I recently became filled with angst and decided it was time to do something about it.
I was first exposed to Selenium a few semesters back, and while I was unimpressed with the end product, it was the first and only project of its kind I was aware of. My first impression of Selenium was based on personal preferences and not scientific merits, so I thought a second impression was due. To my dismay, Selenium still has an interface that appears to have been designed only for Windows, and still exists as a glorified macro editor. While Selenium continues to be a bust, I did come across SeleniumRC which is built to test multiple browsers on multiple platforms.
A Better Selenium?
SeleniumRC exists as a server that acts as a proxy between a HTTP client and a browser. Clients send commands to the SeleniumRC server, which passes those actions on to the SeleniumCore inside of the browser window with the matching session id. SeleniumRC returns each request with the result of the command, allowing the client to control and test pages just as a user would. Therefore SeleniumRC can be scripted using any language that can send an HTTP request, like Ruby, JRuby, or Intel Assembly.
Immediately seeing the possibilities, I set out to plug the SeleniumRC Ruby client into RSpec, which would allow me to write user interaction specs in Ruby. While plugging the SeleniumRC client into RSpec proved to be almost as easy as drag and drop, the honeymoon quickly faded. Getting the tests to run and pass turned into a seemingly endless adventure, where sometimes XPaths wouldn't work, sometimes browsers wouldn't work... a big mess. A good portion of my time was spent testing if my behavior tests would work rather then writing the tests and functional code.
I tried two different approaches in overcoming the issues with XPath that I experienced. The first involved passing Prototype strings to be evaluated to avoid SeleniumCore all together. Unfortunately, for some reason I was unable to discover, none of the Prototype strings were returning values. The second approach used Hpricot to assert the presence of elements or values, as well as generating the XPaths for those elements, which could then be passed to SeleniumCore. Alas, XPath selectors were still not working when they were generated by Hpricot.
In addition to having difficulties in getting the tests to work, the syntax provided by the Ruby client is not very pleasing to the Rubyist's eye. I never expect a 'bonus' piece of code, packaged with a free product still in beta, to be the cat's meow. Still, it is always nice when the syntax is clean and all of the pieces work correctly. Does Ruby written to test Javascript have to look like Javascript?
Another issue I've found with SeleniumRC is that it is as slow as my grandma driving a Cadillac down the expressway on Sunday. Even when running the server and client from your local machine, the tests take an extraordinary amount of time to run. I believe this is due to the manner in which SeleniumRC makes it all possible, and while it may be tolerable when doing full scale testing, it simply is not when practicing BDD.
SeleniumRC is still a beta product, listed as version 1.0 beta. While I agree with the term beta, I feel that in today's day and age software should be roughly usable at version 0.1. Typically I would have no problem working around these issues for something I really want; however active development turns into more of a requirement then a wish. There have only been two releases of SeleniumRC since 2006, with the most work being done in the first half of 2006.
A New Hope
Although SeleniumRC hacked my enthusiasm into pieces, it did manage to further motivate my quest for a headless browser. Taking the ideas I got from my time spent with SeleniumRC and RSpec, I set out to create a class that would allow me to control a virtual browser instance from a Ruby object. I decided to look back at my ObjectiveC/Cocoa experience and poke at the WebKit API for a bit and see what sort of trouble I could get myself into.
With a little bit of elbow grease and Google, I have been able to get a working instance of a WebKit browser neatly bundled as a Ruby object. Currently Javascript strings can be passed to the browser to be evaluated, with their string result being returned.
While there are a few technical details to be worked out, with any luck the power of WebKit in Ruby combined with the magic of RSpec should free the masses from the infinite loop of edit-reload-edit. Of course visual aspects will still need manual testing, as well as user interaction on other browsers. With a little finesse, the same tests written to be tested locally with WebKit could also be used to test remote browsers using SeleniumRC.
Next time I hope to have a working demonstration and sample code, in the meantime here is some eye candy you can feast on!
I think the most significant strength of the Ruby language is its impressive power and flexibility in metaprogramming. Affectionately dubbed "Monkey Patching" by Ruby developers, metaprogramming makes it easy to 'hack' existing code and frameworks like Rails for infinite customization, while making it easy to keep these hacks legible, organized and maintainable.
A powerful programming concept that Ruby makes easy to implement is the use of Proxy Objects. Since Ruby implements duck typing, a proxy object can easily step in to take the place of a regular object so long as the two share a similar object interface.
The following tutorial is the extraction of a "real world" problem that was solved with the simple implementation of a few proxy objects.
The Problem
Let's say that we are displaying information about a collection of people in our system, and we want to display min, max and averages for the attributes of the members of that collection. An immediate solution would be to clutter up my template with code to find the statistic values, but this gets messy and puts too much logic in my view. Using some kind of helper methods would help extract the logic from the view, which would be better.
That's still not quite what I want, though. What I would really like is to have an object that acts just like a Person, but that gives me values based on a collection of Person objects rather than just one. A nice implementation would be:
@john.age #=> 22 @jim.age #=> 48 @susan.age #=> 20 avg = AveragePerson.new([@john, @jim, @susan]) avg.age #=> 30
This approach is very strong if, for example, you're creating a table by iterating through a collection of people. Thanks to duck typing, an AveragePerson could be slipped into a collection of Person objects and the table would be none the wiser -- the average data would be displayed just the same as a single person's data.
Creating Some Models
First, let's whip up some quick model classes. I will simply use new Ruby classes, but the following techniques can be used with ActiveRecord classes in Rails just as easily.
class Person attr_accessor :name, :age, :inches, :weight def initialize(n, a, i, w) @name = n @age = a @inches = i @weight = w end end class AveragePerson def initialize(col) @collection = col end end
Writing Some Tests
Let's write some RSpec tests to set some goals for our AveragePerson class:
require 'rubygems' require 'spec' describe "People Statistics" do before(:each) do @james = Person.new("James", 23, 74, 210) @cheryl = Person.new("Cheryl", 47, 63, 115) @timmy = Person.new("Timmy", 12, 55, 87) @people = [@james, @cheryl, @timmy] end describe "AveragePerson" do before(:each) do @average = AveragePerson.new(@people) end it "should give the average age" do @average.age.should eql((@james.age + @timmy.age + @cheryl.age) / 3) end it "should give the average weight" do @average.weight.should eql((@james.weight + @timmy.weight + @cheryl.weight) / 3) end it "should give the average inches" do @average.inches.should eql((@james.inches + @timmy.inches + @cheryl.inches) / 3) end it "should be named 'Average'" do @average.name.should eql("Average") end end end
Now let's get these to pass.
A Little Proxy Magic
Per our specs, we need to add #age, #weight, #inches and #name methods to AveragePerson. This would do the trick, but it's not DRY:
class AveragePerson def name "Average" end def age total = @collection.inject(0){ |sum, person| sum += person.age } total / @collection.length end def weight total = @collection.inject(0){ |sum, person| sum += person.weight } total / @collection.length end def inches total = @collection.inject(0){ |sum, person| sum += person.inches } total / @collection.length end end
The tests should pass now, but we have some serious repetition and need to DRY this up. It's also not extensible -- any time a method is added to Person, another would need to be added to AveragePerson.
All we're really doing here is proxying the method that AveragePerson receives to each member of the collection, so let's use method_missing to do this more concisely and extensibly.
class AveragePerson def name "Average" end # proxy methods to collection, return the average of results def method_missing(method_name, *args, &block) total = @collection.inject(0){ |sum, person| sum += person.send(method_name, *args, &block) } total / @collection.length end end
Now AveragePerson will proxy any methods it doesn't have to its collection, add up the results and return the average. The tests still pass and we have much cleaner code.
Adding in Height
It isn't very helpful to just tell a user how many inches tall a person is, it would be much more useful to return a string like "5ft 8in". Let's integrate that into the Person model and write a test for AveragePerson:
class Person def height inches_to_height(self.inches) end private def inches_to_height(_inches) ft = _inches / 12 ins = _inches % 12 "#{ft}ft #{ins}in" end end ### describe "AveragePerson" do it "should give the average height" do @average.height.should eql("5ft 4in") end end
Though our AveragePerson object is proxying the #height method to the collection, this test crashes and burns because our #method_missing assumes that each member will return a number, not a string like "6ft 0in". To fix this, let's "override" AveragePerson#height.
class AveragePerson def height ft = self.inches / 12 ins = self.inches % 12 "#{ft}ft #{ins}in" end end
The tests pass, but we still have some refactoring to do. I've duplicated the height display logic from Person into AveragePerson, so let's extract this out into a module and include it in both classes:
module PersonHelper def height inches_to_height(self.inches) end def inches_to_height(_inches) ft = _inches / 12 ins = _inches % 12 "#{ft}ft #{ins}in" end end class Person include PersonHelper end class AveragePerson include PersonHelper end
I can also include PersonHelper into my RSpec tests so that I can make them more legible using the #inches_to_height helper method:
describe "AveragePerson" do include PersonHelper it "should give the average height" do # @average.height.should eql("5ft 4in") @average.height.should eql(inches_to_height((@james.inches + @timmy.inches + @cheryl.inches) / 3)) end end
Creating Max and Min Classes
Using the same proxy techniques, we can easily build out MaxPerson and MinPerson classes. First the tests:
describe "MaxPerson" do before(:each) do @max = MaxPerson.new(@people) end it "should give the max age" do @max.age.should eql(@cheryl.age) end it "should give the max weight" do @max.weight.should eql(@james.weight) end it "should give the max inches" do @max.inches.should eql(@james.inches) end it "should be named 'Max'" do @max.name.should eql("Max") end it "should give the max height" do @max.height.should eql(inches_to_height(@james.inches)) end end describe "MinPerson" do before(:each) do @min = MinPerson.new(@people) end it "should give the min age" do @min.age.should eql(@timmy.age) end it "should give the min weight" do @min.weight.should eql(@timmy.weight) end it "should give the min inches" do @min.inches.should eql(@timmy.inches) end it "should be named 'Min'" do pending("Needs to be implemented") do @min.name.should eql("Min") end end it "should give the min height" do @min.height.should eql(inches_to_height(@timmy.inches)) end end
And here are the classes:
class MaxPerson include PersonHelper def initialize(col) @collection = col end def name "Max" end def method_missing(method_name, *args, &block) @collection.inject(0) do |highest, person| val = person.send(method_name, *args, &block) highest = (val > highest) ? val : highest end end end class MinPerson include PersonHelper def initialize(col) @collection = col end def name "Min" end def method_missing(method_name, *args, &block) # assumes at least one item in the collection will return a value less than 1000000 @collection.inject(1000000) do |lowest, person| val = person.send(method_name, *args, &block) lowest = (val < lowest) ? val : lowest end end end
More Options with Extend
Our proxy objects are working nicely, but what if we want to access our objects a different way? Maybe it would be slicker if the collection itself had methods to instantiate these objects rather than simply instantiating them explicitly. Let's do something like this:
@people = [@john, @jim, @susan] @people.min_person #=> MinPerson object @people.max_person #=> MaxPerson object @people.avg_person #=> AveragePerson object
Here is a new describe block of tests:
describe "PeopleCollection" do it "should return a MinPerson" do @people.min_person.class.should eql(MinPerson) end it "should return a MaxPerson" do @people.max_person.class.should eql(MaxPerson) end it "should return an AveragePerson" do @people.avg_person.class.should eql(AveragePerson) end end
Let's create a module that will define these collection methods:
module PeopleCollection def min_person MinPerson.new(self) end def max_person MaxPerson.new(self) end def avg_person AveragePerson.new(self) end end
One way to apply this module so that our collection will have access to these methods is to open up the Array class and include the module:
class Array include PeopleCollection end # or you can use the send hack: # Array.send :include, PeopleCollection
I'm not sure this is what I want, though. The tests passs, but I don't really need (or want) every single array in my app to have these methods -- at best it's sloppy, at worst I could run into conflicts. I really only want my @people collection to have this functionality, so instead I will extend that instance with my module only when I need it. Here's the updated describe test block:
describe "PeopleCollection" do before(:each) do @people.extend PeopleCollection end it "should return a MinPerson" do @people.min_person.class.should eql(MinPerson) end it "should return a MaxPerson" do @people.max_person.class.should eql(MaxPerson) end it "should return an AveragePerson" do @people.avg_person.class.should eql(AveragePerson) end end
I have found this to be a helpful technique when you just need to add a little extra functionality to a single object and want to keep it lightweight.




