Who can you trust in a world of big data?

By CSC’s Eric Pinkerton

There’s some good news and some bad news — which do you want first?

A Catholic priest cannot breach the confidentiality of confession; it’s non-negotiable even on pain of death. Indeed, the mechanics of this sacrament are utterly dependent upon this simple tenet.

Likewise, though perhaps not to the same extent, similar bonds of trust need to exist between physician and patient, or lawyers and their clientele, and accordingly these relationships are all enshrined in law.

Of course no such laws exist for the smartphone manufacturer who knows when you text behind the wheel, or the social network, which knows that you stalk your ex.

Bad Apple

“Bless us Father for we have sinned; it was the apple that ratted us out, not the serpent.”


You live in a world where your insurer can tell when you fib about parking in a garage at night, and your supermarket knows about your trans-fat addiction, and perhaps that supermarket is also your insurer now and your smartphone maker owns that social network.

Worst of all, that wearable on your wrist is reporting to the cloud in real-time every time you participate in something you’d rather the world not know.

We can scarcely begin to imagine the volume of data that exists in the Dystopian big data cloud-powered Internet-of-Things hyper-connected future, where the price of storage is in constant freefall and where data is not just sold to the highest bidder, but likely sold to every bidder on an ongoing basis.

In the beginning there were books…

At the age of 10, I remember vividly stepping through the doors of the reading room of the British Library for the first time. A blue dome straddles 360 degrees of arched windows, illuminating three  stories of multi-colored spines that encircle scores of hand-carved mahogany desks.

Brit Library

Diliff, Wikimedia

The unmistakable smell of old books adds to the gravitas of a room where history has not only been recorded, but conceived. Conan Doyle, Wilde, Stoker, Kipling, Orwell, Bernard Shaw, Twain and Marx all sat at these desks.

To me it seemed intoxicating that perhaps somewhere buried among the countless musty volumes surrounding me was the answer to every question I could ever conceive, just within reach, but even if I was to devote my entire life to reading, I could scarcely scratch the surface.

Remember that this was only one of many rooms belonging to the British Library, and the collection comprises of not only books, but newspapers, magazines, manuscripts, musical scores, records, tapes, maps, stamps, lithographs, photographs, films and well, you name it.

A child’s mind is often abuzz with endless questions, and mine wondered how many other libraries around the world there were, and how many items did they contain, and in how many languages were they written and at what rate were they growing?

And then it was big…

Five years ago, Google’s Eric Schmidt asserted that every two days, the world generates as much information as we did from the dawn of civilization up until the year 2003. Now I don’t know how he was estimating the bandwidth of cave paintings, or if he was accounting for ancient hieroglyphs on undiscovered temples, but regardless, it remains an impressive statistic, especially when you consider that this was 2010 when the Internet of Things was barely even a thing.

We are now five years into an exponential upturn on the graph of information creation, and to paraphrase Sting, every step you take, every webpage you read, every antenna you pass, icon you hover over, dollar you spend generates a rich tapestry of ones and zeroes that is increasingly becoming more cost effective to store than delete.

As a child, I could never have conceived that a time might come when I might personally generate in a single day the same amount of information that I was struggling to comprehend the first time I set foot in that library.

Consider that for every minute that passes, 300 hours of video are uploaded to YouTube. Now who am I to compare the complete works of Shakespeare with an infinite number of keyboarding cats, but the point here is that no matter how banal this explosion of modern information may seem to us, it might well contain the answers to every question that someone somewhere out there is dying to ask.

Consider that we are constantly inventing new ways to generate data every day, and that we are routinely, and often unwittingly generating, connecting and sharing this information up to the cloud without ever appreciating the implications, or reading a single word of the endless EULAs and privacy policies we tacitly accept.

So what’s the problem?

Well, thanks for asking. The first problem I see is the opportunity for error in a world where ‘data science’ is such a new, rapidly developing and evolutionary discipline, and a skills vacuum where kids with a D in Stats 101 and a Dummies’ Guide to Hadoop can call themselves data scientists.

For instance, it seems logical that the bigger the data sets, the more accurate the answers, but it is also true that the larger the data set, the greater the number of coincidences will be.

If you study lottery tickets, you will find winners, and if you look at lottery winners, you will undoubtedly discover that some of them claim to be clairvoyants or are repeat winners, so how can you resist the temptation to conclude that those people were able to predict the future?

That’s all pretty benign, but what about when such coincidences might result in, say, someone wrongly being denied credit, branded a bad citizen or being added to a no-fly list?

Now consider that everyone gets the occasional phone call that turns out to be a wrong number. Now if you get the occasional wrong number, then it’s reasonable to assume that say the leaders of ISIS might get, or even make the occasional misdial? Well, what if a simple fat-fingered blunder resulted in someone innocent erroneously being targeted by a drone strike?


The next problem is the privacy implications. Often big data can be used to imply things about us that could cause us detriment. For instance how would we feel if our GPS company was to divulge how often we break the speed limit, or if our grocery-store rewards scheme ratted on how much saturated fat and alcohol we consume?


Finally, we have security.  Who has access to this information, where is it stored, how is it protected and what are the real consequences if it is exposed or simply sold?

You may remember in August 2014, a collection of private pictures of various celebrities was posted online, and later disseminated across numerous social networks.

Apple did a fantastic job of playing down what was arguably one of the first real big data breaches, publicly stating that this was “a very targeted attack on user names, passwords and security questions, such as phishing and brute-force guessing, rather than any specific vulnerability in the iCloud service itself.”

It’s noteworthy that within a day or so, it appeared that they had patched a potential flaw in the Find My iPhone API that reportedly facilitated brute-force attempts due to the absence of a lockout rule.

It’s hard not to remain open to the possibility that a breech of this nature could conceptually have included a malicious insider, even if only to identify the accounts worth targeting.

So what is the solution?

Well, the bad news is that there is not a great deal that Joe Public can do short of ‘living off the grid,’ which, let’s face it, is not really an option for a normal person in 2015.

It’s important to be mindful of the information you generate, and its value in a big data world. It’s also important that when you have a choice, you make sane choices about that information. For example, if you happen to be famous, then a public cloud with a single factor auth might not be the best place for your nude selfies.

If you live off unfiltered cigarettes, deep fried doughnuts and store-brand vodka, then a loyalty card scheme might not end up saving you money in the long term; then again, if that’s you, you’re probably not a long-term thinker.

Lastly when you encounter phrases such as “From time to time we may share your details with carefully selected companies,” you should read, “We will sell all your data to whoever wants it whenever they want it.”

The good news is that if you work for a company that is discovering big data for the first time, there is a lot you can do:

  • You can strive to understand the potential impact of mistakes and errors in your algorithms before they happen, and understand that big data can result in big mistakes.
  • You can appoint someone whose responsibility will be to set clear ethical boundaries from the outset and ensure that they are adhered to — aka the illusive Chief Privacy Officer.
  • You can create a privacy policy that is legal, honest, concise and easy for people to understand.
  • You can compete on privacy; that is to say you can use the fact that you excel at privacy to your advantage from a marketing perspective.
  • You can make sure that the data you compile and your analytical toolsets are well secured.
  • You can make a conscious decision as an organization to only use big data for good.
  • You can put safeguards in place to deter, delay, deny and detect the misuse of such technologies within your organization.

Feel free to comment if I made you think about this differently, or if you have something to add that I have missed.

Bio-picEric Pinkerton, a CSC Cybersecurity principal security consultant, has worked on numerous cloud assurance engagements, including complex control audits, detailed threat risk assessments and technical configuration reviews.  Pinkerton is also proud to have contributed to both the NESAF Cloud Security Framework and the CSA Cloud Controls Matrix.



Speak Your Mind


This site uses Akismet to reduce spam. Learn how your comment data is processed.