Jon Udell writes about where the US gets its oil (the majority comes from Canada, not the middle east). He follows the data with the story of how he discovered the answer and presented the data using DabbleDB.

I agree with Jon when he says “we’re really close to the point where non-specialists will be able to find data online, ask questions of it, produce answers that bear on public policy issues, and share those answers online for review and discussion.” It’ll likely need another generation of tools before we work out the glitches and hiccups in the data flow and it’ll require the generation who grew up with the web to expect such tools at their fingertips and put them to use. I look forward to seeing what comes next.

David Martin, an assistant professor at Boston College, has published a fabulous sorting visualization (via HMK).

These should be a must-see for every computer science student. He includes some good notes about what we should be looking for and why we might care (excerpted below). Often people publish marvelous visualizations, but people who are new to the subject matter can miss key aspects of what they might learn from the detailed visual cues. It is great to see the combination here.

These visualizations are intended to:

* Show how each algorithm operates.
* Show that there is no best sorting algorithm.
* Show the advantages and disadvantages of each algorithm.
* Show that worse-case asymptotic behavior is not the deciding factor in choosing an algorithm.
* Show that the initial condition (input order and key distribution) affects performance as much as the algorithm choice.

The ideal sorting algorithm would have the following properties:

* Stable: Equal keys aren’t reordered.
* Operates in place, requiring O(1) extra space.
* Worst-case O(n·lg(n)) key comparisons.
* Worst-case O(n) swaps.
* Adaptive: Speeds up to O(n) when data is nearly sorted or when there are few unique keys.

There is no algorithm that has all of these properties, and so the choice of sorting algorithm depends on the application.

Can search query trends provide an accurate, reliable model of real-world phenomena? Some folks at Google have been tracking how often people search for flu-related terms and to what extent it relates to CDC data about how many people see their doctor with flu-like symptoms.

They have put together a compelling visualization along with a great article about the process. Here’s an excerpt:

“It turns out that traditional flu surveillance systems take 1-2 weeks to collect and release surveillance data, but Google search queries can be automatically counted very quickly. By making our flu estimates available each day, Google Flu Trends may provide an early-warning system for outbreaks of influenza.

“For epidemiologists, this is an exciting development, because early detection of a disease outbreak can reduce the number of people affected. If a new strain of influenza virus emerges under certain conditions, a pandemic could emerge and cause millions of deaths (as happened, for example, in 1918). Our up-to-date influenza estimates may enable public health officials and health professionals to better respond to seasonal epidemics and — though we hope never to find out — pandemics.”

The Google folk, Jeremy Ginsberg, Matthew Mohebbi, Rajan Patel, and Mark Smolinski, and Larry Brilliant have teamed up with Lynnette Brammer from the CDC and have written an article that has been accepted into the scientific journal Nature. Fascinating, yet somehow spooky.