Visible Certainty

thoughts on building thoughtful presentations

Aug 22, 2008 2:59pm

You're supposed to update these blog things "regularly" ?

Although our site is now public, there are a number of features we want to add in the coming weeks and months. Our aim is to produce a website that will make it easier for you to analyze, display, present and share your information. Over time, we plan to add a number of tools to guide you in exploring your data. Just to give you a sense of where we are headed, these are our rough development goals.

Near term upcoming features include:

  • speed (we continue to profile and hack the code to improve responsiveness)
  • summarizing and aggregation of information (e.g., if you add a time series with daily observations, group and summarize them by month)
  • improved data importing (so we do the heavy lifting of pivoting your tables, concatenating column headers that are spread across rows, etc. into a format we can more easily chart)
  • joining data sets together to produce new charts
  • data units (you upload dollars, we keep track of that and let you specify what kind of dollars, etc.)
  • updatable data sets (you can re-upload a data set or if you have imported a google spreadsheet, we will automatically pull in changes, etc.)

Medium term features include:

  • revamped user interface
  • small multiples (many charts, laid out together in space)
  • print-quality table layout (sometimes a table of numbers is the best way to make your point!)
  • more annotation tools (e.g., trend lines on charts)
  • handling geographic data (plot it on maps)
  • box plots

Longer term features include:

  • full presentation creation
  • social networking (more tools to share, discuss, rate and review data, charts and presentations)
  • magical fairy things of which we can not speak

Obviously, all of this is subject to change. And if you have a suggestion, please let us know.

Jason

Jul 23, 2008 11:59am

Going Live Today

Hi Everyone,

Visible Certainty is going live today. This has been a long trip and might be worth visiting the history of how we got here.

First Jason and I spent from 1998 to 2006 trying to get to a Tufte conference. We were fascinated by the premise of a better way to communicate data, but we were also pretty busy. The first time we talked about it we were busy working on the software for a company called Gamesville. Jason’s friend Peter had heard Tufte speak and was bowled over by it. He even produced a “Tufte Presentation” at one of his own conferences and got rave reviews about it. So we were intrigued. But, two companies later (GameLogic, QuantumFoam) and we were still procrastinating. Finally the planets aligned and Tufte was in Boston and we were between companies trying to decide what the new new new thing would be.

It was riveting. It was a moment of clarity. When Tufte told that story of Galileo discovering actual scientific evidence that the earth revolved around the sun and termed it “visible certainty” we were sold. Even better that the Catholic Church told him never use those words again. Better yet, Tufte was complaining that no one had ever put together the right tool to produce elegant, simple, data-dense presentations in a single package. A quick dotster search showed that visiblecertainty.com was available so for $9.95 we started a new company. Okay maybe a little more for computers, lawyers, office space, employees, software, and some other stuff but that initial $9.95 was the important thing.

Ahhh… but how to start.

You really have to do a lot of reading about stuff that is way above my pay grade.

First there were the four Tufte books. Then the Grammar of Graphics by Wilkinson for ideas for how to boil a software design out of a visualization. Then there are Cleveland’s Visualizing Data, and The Elements of Graphing Data, which is really about the guts of the statistical ideas behind the visualizations. And finally Stephen Few’s Show Me The Numbers. You may notice that when you choose chart types in our tool they are broken down into his groupings. Few really seems to have struck a wonderful balance between elegance and daily effectiveness.

Now what technology to use? Jason and I had become huge fans of Ruby, Rails, EC2, and S3 over the last year or so (but don’t get me started on some of the scaling issues we are running into or EC2 down time). So that was a place to start. We did our time with gruff, ImageMagick, rmagick, rgplot, R, and ChartDirector.

Thank god we hired Kris in October of that year because in early December after banging our heads on the code for another month or so, he mentioned that Flex 3 was supposed to have fixed the configuration, scaling, development environment, and language issues that made us all hate/fear it before. So Jason and I discussed it, decided it was best to pretend it was our idea, pushed off our December release and moved the chart and data set generation to Flash. JD came on in January just in time to be tortured by this decision. He managed to solve all the things that overwhelmed the rest of us (including changing a O(n^2) “feature” to O(1) in the off-the-shelf data transport code we were using). Then about two weeks ago, Peter (remember him from the beginning of this story) agreed to join up - after all he was REALLY to blame for the whole thing.

And sixteen months later, with employees in three time zones, here we are! At least we didn’t have to name it Visible Certainty 2008 to release ;-)

Next stop: data joins and small multiples!

Hope you all enjoy the tool,

Stuart

Jul 22, 2008 12:00am

Space Shuttle Challenger

Is it the data, the tool, or the editorial choices that are made in analyzing the data that cause good people to make bad decisions based on poor data visualizations?

For instance, on the morning of Jan 28, 1986, the Space Shuttle Challenger broke apart 73 seconds into its flight. The disintegration of the shuttle was caused by the failure of an O-ring seal in its right solid rocket booster. The seal failure allowed a “flare” to weaken the structure of the external fuel tank and aerodynamic forces promptly broke apart the vehicle.

The rubber O-ring failed because the launch took place on a particularly cold morning and the cold caused the seal to become brittle and rigid. They needn’t have launched on that day. So why did the team decide to launch that morning knowing full well that it was going to be colder than it had been on any other launch day?

As Tufte notes in “Visual Explanations” pp.39-53, each launch generates “incident” data: what part or parts failed on that day and what was the launch environment. The team made a single editorial mistake when deciding to launch: the team chose to chart only the launch data where O-ring failures occurred:

Chart provided by Visible Certainty

The fact that the rubber rings were failing primarily in colder temperatures was apparent only when you plotted all the data–especially data from launches where the O-rings didn’t fail:

Chart provided by Visible Certainty

So, one poor editorial decision and seven people died. No one in the process wanted that to happen. There is probably no software-only way to protect against this kind of mistake. But that won’t stop us from looking for one.

Mar 20, 2008 12:00am

Analysis is Iterative

Cleveland makes the point that analysis (problem solving) is iterative. This should include different data input choices. We are toying with the idea of just throwing up a bunch of visualizations to get you started. You point us at the data and we will make some basic assumptions and show you the information several different ways. Maybe one of these visualizations will spark your creative juices. And making and unmaking further editorial choices should be simple, easy and fun.
Feb 27, 2008 12:00am
Dec 18, 2007 12:00am

From the awful to the sublime

The New York Times has been breaking a lot of ground with innovative graphics. One of their graphic designers, Matthew Ericson, recently gave a talk on their approach. This Sunday’s paper had examples of great and awful charts.

First, the awful. In an article on global warming, this appears:

It’s a little hard to see what’s going on there. That’s the problem. In print, this chart takes up almost an entire page. On the web, it has to have a special viewer.

It’s a classic example of a bad ink to data ratio. The data density is low, most of the ink is used for show, not for information. The graphic is showing historical and projected carbon dioxide (CO2) emissions. I’m not against good design – but I think this display fails on its own terms, to help the viewer understand the nature of CO2 emissions. Instead of just making it an easy to read line chart, it turns the one-dimensional value (emissions in a given year) into a square, putting us in the position of comparing all these different sized squares. We get some sense there’s been an acceleration in emissions, but it’s obscured by the rising smoke.

Taking the same data (up to 2004) but charting it, we can get more detail (including the major components of the emissions):

Chart provided by Visible Certainty

Even this isn’t ideal. It would be nice to see the data normalized for world economic output, for population, etc., as well as then broken into smaller pieces (geographic region or country), and annotated for major events (e.g., World War II).

Just to be clear, I’m not hating on this chart’s author, Bill Marsh, in general, just this specific chart. I liked his water article from July but this one didn’t work for me.

Now to the sublime. Same newspaper, same day, different story.

Before going further, I should note that this coverage partially suffers from the standard problem with political reporting in the mainstream media. The Presidential election is treated as an exciting horse race, with candidates jockeying for position (e.g., editors rarely assign reporters to write about the issues the candidates are all ignoring). That said, like a good action movie, the following chart can be appreciated on its own terms (click on it to see the full version):

This chart makes wonderful use of the concept of a small multiple. Learn how to read one piece of it, and you can read the whole thing. You can study it for several minutes and notice interesting patterns. There are some great annotations as well to explain outliers. Some observations:

  • Democrats are hammering away on Bush and Iraq while talking hawkishly about Iran.
  • Republicans are trying to wrap themselves in Reagan and tax cuts.
  • The Republicans (mainly) attack abortion; the Democrats ignore it.
  • The Democrats talk about the environment; the Republicans ignore it. But neither group is talking much about climate change specifically.

And so on. More than just a clever idea, this chart is a wonderful display of information. 11 candidates, 21(!) debates, and 16 words or phrases combine to create over 2000 numbers. Because of this chart, they can be easily compared and analyzed.

Our goal at Visible Certainty is to make it easier to create such presentations. I don’t know what software the authors, Jonathan Corum and Farhana Hossain, used, but they had to do a lot of this work by hand. Even after you assemble the data set from word analysis of the debate transcripts, there’s a lot to do to create so many charts and lay them out. We’re going to help with all of that.

One last note. There was another set of charts accompanying the debate analysis.

In the newspaper, confined to black and white, they are very hard to read. Online, with color and interactivity, they are easier to understand. They try to show how much each candidate is referring to each other candidate (and the corollary, who is getting the most mentions).

Front-runners tend to get mentioned (attacked) more. So it’s no surprise that Clinton and Obama, along with Giuliani and Romney, get the most mentions. Judging by the bipartisan mentions she’s getting, Hillary Clinton is considered the overwhelming favorite for the Democratic nomination. You can also see how the front runners get more time to talk during the debates. I’ll leave the implications of that for later.

There’s a lot of information being packed in here, but in print, it’s hard to read and online, it’s hard to compare. Perhaps instead of the circles, a series of bar charts would work. One set would show how many times each candidate has been mentioned in each debate, as well as a summary bar chart to compare total mentioned of each candidate. The circle charts don’t make it easy to see whom each candidate is talking about, but you could show that with a series of bar charts as well (or one time series line chart showing the relative rise and fall of each candidate).

But I can’t show that to you easily – I don’t, yet, have the data.

Dec 11, 2007 12:00am

First.

This is our inaugural blog post. We want to use this blog to cover three main areas:

  1. Thoughts on creating and giving great presentations
  2. Providing some transparency about our development process and progress
  3. Code recipes that may be of use to others. We’re using “Ruby on Rails”:http://rubyonrails.com/ for our website, and we’ve relied on many other blogs posts to answer various questions we’ve had. We’re hoping to give back a little.

There are an increasing number of blogs on charting and we will try not to be redundant. Our first real post follows…

Page 1 of 1