Friday, September 17, 2010

Data Visualization, Information Overload, and Compression

In his TED Presentation on data visualization, David McCandless touches on information overload (starting ~16:38), suggesting that data visualization is one tool in our battle with information overload--that good data visualizations enable us to take in data through our eyes and process it in our brains much faster than similar amounts of data communicated through text and numbers.

This reminded me of Mark Hurst's Bit Literacy work:
Bits are heavy. Though they have no physical weight, bits--the electronic data that flows in and out of our e-mail inboxes, cell phones, Web browsers, and so on--place a weight on anyone who uses them. A laptop computer weighs the same few pounds whether it holds one e-mail or a thousand, but to the person who has to deal with all those e-mails, there is a big difference. Appearing in large numbers as they often do, bits weight people down, mentally and emotionally, with incessant calls for attention and engagement....

The problem can be solved by learning bit literacy, a new set of skills for managing bits. Those who attain these skills will surmount the obstacles of overload and rise to the top of their professions, even as they enjoy a life with less stress, greater health, and more time for family and friends. Bit literacy makes people more effective today, even as it equips them for the future.
Mark points out that you can read every day about the information overload problem, but it's very difficult to find practical help dealing with information overload. So his book, Bit Literacy, provides elegant, practical techniques for just that, most of which involve filtering, prioritizing, and organizing incoming data.

I see an intriguing connection between data visualization and bit literacy--an underlying suggestion of a powerful technique that I'll call "compression." Think of it this way:
When a program like WinZip or iTunes compresses a file, it creates a new file that contains most or all of the source information, but using fewer bits to represent that information.

And data visualization does the same thing. A good data visualization takes a large amount of data, either qualitative or quantitative, and displays it in form that conveys most or all of the source information, but using fewer bits to represent that information. This suggests the notion of "compression" as one technique for dealing with information overload.
A few compression examples come to mind.

In the last few years, management "dashboards" have started proliferating. These dashboards essentially take a large amount of information about how a product or company is performing, and compress it into one or two pages of charts, key performance indicators, and short explanatory text. This compressed version of the information enables a manager to quickly take in a tremendous number of bits very rapidly.

Design personas fulfill a similar function. We start with mountains of data from many sources to understand our customers and their needs, and we compress that data into a small number of composite characters called personas. Then we use those personas to communicate with the project team and stakeholders. Essentially, we create compressed versions of the data.

Both of these are examples of "lossy" compression. In the world of compression, "lossless" compression means the compressed file contains all of the information from the original--it's just stored more efficiently. When you download a software application, that software is typically stored in a lossless format, so that when you decompress it, you get all the information of the original. Contrast this with "lossy" compression, in which the compressed file is both smaller, and takes up less space, than the original. This is what you get with an mp3 audio file--you can still enjoy the song, but some of the audio fidelity has been removed so you can fit more songs on your iPod. The trick with lossy compression is to systematically determine a) how much fidelity is required, and b) which data can be removed while still retaining the key information.

Back in our information overload space, this becomes the key question--how can we systematically reduce the bits coming at us so that we can send and receive the essence of a large data set while retaining the key information we need to make informed decisions.

One more example highlights the potential power, and the risk, of using data visualization to combat information overload:

A stock ticker widget essentially compresses all of the data about stock trading into a handful of numbers. After millions of trades today, the Dow Jones was up 1.2%, ending at 10,603.54. This is an attempt to compress not only the stock market, but the economy as a whole. If the Dow is at 10,603.54, the economy is probably better than it was last year, but still struggling.

So the stock ticker saves me the trouble of having to look at all of the data about today's trading. This is good. On the other hand, when there's a TV screen in my elevator barraging me with data about how the Dow, NASDAQ, and S&P 500 are changing from one minute to the next, that's way more information than I need or want. Some further compression would help. As in software compression, it's not only a question of which data to keep and which to remove--it's primarily a question of how small I need the compressed version to be. In the case of a typical consumer, we could add information and compress it even more by presenting a weekly updated graph of performance over the past 10 years.

So I'm having fun playing around with this metaphor, and I have three main questions:

1) Who else has written about compression and/or data visualization as a means to combat information overload?
2) What are some more examples of compression being used effectively to combat information overload?
3) How might we apply this concept in fresh ways to make ourselves more productive and happier each day?