The data was unclear until something occurred

Whenever a tragic event occurs, data wonks appear on the news shows to either confirm or second-guess the signs that certain aspects of the tragedies could have been predicted.

In the case of last month’s attack in Barcelona, there was the “lost in translation” challenge when many of the local city authorities were interviewed about whether there were any signs that this terrorist cell might be preparing for an attack.

One of the most interesting quotes I heard about predicting the attack was from a Spanish intelligence official: “The data was unclear until something occurred”.

I had to digest the quote to determine whether the Catalan conjugation translated the comment into something that seemed so totally obvious, or whether there was something more compelling about the phrase.

It made me think of how we increasingly expect data to be predictive even though we’ve not determined a causal effect between data points A and B. It also reminded me of the advice from my data scientist friends about starting with what you’d love the data to tell for you rather being disappointed that it didn’t tell you a story on its own. The Barcelona quote crystalizes the fact that data insight requires patience.

Fortunately, in most of these data analytics exercises the data validation “occurrences” are not as tragic what happened in Barcelona. But history tells me that there is a natural resistance to letting the data lead you to purposefully negative conclusions simply to find the outer limits. I’ve been surprised to see that this anxiety includes instances where nothing bad can really happen from the results because of restricted sampling sizes and distance from major revenue sources.

For example, in the data business in media there is always the need to see where the sweet spot is on volume mailings to generate sales leads for companies. In the old days, media companies would mail to the entire database (maybe a million names) in order to net a few hundred leads.  The inefficiencies were tremendous because there were literally hundreds of thousands of people who received mail to which they had absolutely no desire to respond. It was just difficult to know who those people were.  What was clear was that the list fatigue that occurred  by emailing repeatedly to a million people would depreciate the response and thus reduce the value of the database.

For those with a professional comfort level for using a big net to reach small response targets, the downsizing exercise was nerve-racking. So when it was recommended that we mail to 100,000 rather that 1 million, the pushback was very strong. It became clear that we could generate the same response by mailing 10x less. Eventually, it was found that a mailing universe of 50,000 was the tolerance level below which response fell off precipitously. This meant that the rest of the unused database universe of 950,000 could be used for other programs with no fear of fatigue.

Meanwhile, there are those among us who remain infatuated with their warehouses of random or unstructured data with no clear meaning. They wait, desperately hoping something will occur that validates value of their warehouses — constantly battling with database managers to reduce file size and storage costs. But no, they know they will eventually force something to occur that converts a stockpile of useless petabytes into a major corporate insight asset.


An Account Based Marketing primer for enterprise IT

Five natural enemies of predictive data

Machine intelligence still requires gray matter

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: