Monday, 21 May 2012

Corrolation, Causation, and Prediction in a World of Data

 As image memes gain popularity on social networks and forums, they are fast securing their place as a defining cultural aspect of the early tweenies...(unlike the word "tweenies", thankfully).

Most of these images are humourous, as this is great for virality, many are profound, some just witty nuggets of wisdom.

And then there is the propaganda. Intended to illicit an emotional response to a political idea, propaganda memes are used to affirm or reaffirm a political bias or dogma. They are often aimed at a very particular niche. If you have any particular political or activist persuasion, you will no doubt have seens endless streams of these one-sided affirmations.

At best, they are intellectual masturbation. At worst, it's pseudo-scientific social engineering.

The worst form of this that I have seen is data correlation inferences. Just because something happened on a certain date does not mean it caused something else that happened around the same time. It is completely irrational to infer causation from a correlation, and most people are subconciously aware of this, they will just choose to ignore it if the correlation fits in with their beliefs. So this form of non-sequitur is becoming an increasingly utilised mechanism for these pieces.

It is a shame that this kind of irrationality is being entertained, not least because data correlations can be valuable analysis tools. Correlations can be useful indicators for understanding social dynamics, as long as it is acknowledged that this evidence is purely circumstantial.

Alone, correlations are not proof , but they can reveal vital clues about possible causation.

They can also be powerful in assisting predictions. The more we know, the more correlations we will uncover, and the more we can use the circumstantial correlations of the past to make reasonable conjectures about the future.

As Twitter and Google and other web-enabled data collectors increase both the range and volume of their publically available data, the more correlations become available to anyone with the inclination to look for them. Using tools like PowerPivot, our prediction capabilities become ever more reliable, and with them, the possibilities of social engineering reach new and increasingly influential heights.

Post a Comment