Don’t be a Data Vandal!

I started using email in 1995 – in the days when each person could have their own dedicated IP address, and email clients were command-line driven and stored their data locally.

I quickly realized that my policy of deleting emails once dealt with was not particularly wise – what if, in 20 years time, I needed some information from one of them. A little obsessive, maybe, especially back then – but guess what: at the beginning of this year I had to put together a dossier of just about every last thing I’d ever done in IT with supporting references – and some of that information was only to be found in my antique email archives!

Score one for the email hoarder!

Nowadays the major online email service providers offer an “archive” option in preference to a “delete” option; storage is so much cheaper now, and seemingly limitless – so why would you delete emails that you might need to access some time in the future if it’s not costing you anything to store them?

Privacy and security concerns are, of course, perfectly legitimate responses to that question – but local storage of encrypted data is an equally viable non-destructive alternative.

Why am I banging on about my email habits?

Because the above anecdote is a microcosm of a bigger question:

Why would a corporation delete any data these days? [statutory requirements to do so aside]

Another story – consider a construction company launching an online quotes service. For what appears to be perfectly legitimate operational capacity reasons, they decide to delete unconverted quotes from their database after 60 days, and log data from the various application tiers after 30 days. A little while later, a data scientist working for that company figures out an algorithm correlating those unconverted quotes with web server log data, which might explain why 25-30 year-olds in a particular southern Pasadena suburb never hire the company. With such insight the company could correct the offering features that this demographic finds so objectionable – or create a new offering targeted at that market sector.

But the data is gone. Opportunity missed!

The key idea in both anecdotes is this:

When the data was first acquired, it was not obvious to what use it could be put after its immediate utility had apparently expired.

In the first case (my emails) the conservative approach paid off 20 years later! In the second, the “destructive” approach deprived the company of a potential business opportunity.

Now, ten years ago, before the advent of inexpensive quasi-inexhaustible archiving services (such as Amazon Glacier), and the emergence of data science as a critical enterprise strategy tool, one could have been forgiven the pragmatic decision to do a regular data purge from production systems, piping to /dev/null.

However, fast maturing data science practices now promise almost limitless potential for gleaning valuable business insights from such data. There are three important characteristics of data science (at least as far as this argument is concerned), enumerated by my fellow CSC DE, and data science Jedi, Oveje Doblu*:

  1. The very nature of data science is experimental.
  2. The questions you answer with data science are the ones for which you have the data.
  3. You don’t know ahead of time what question any particular body of data can answer.

The resulting best practice, then, is to keep around as much data as you can, on as many different topics as you can, for as long as you can.

Digital data is no longer simply a means to an end – it is an extremely valuable asset in its own right, with potential real business value both now and in the future.

These radical shifts in the role and importance of data are strongly emphasized in all of CSC’s Journey to the Digital Enterprise papers, but particularly in those that focus on the Banking, Healthcare, and Insurance industries.

So don’t be a data vandal! If you are a CIO, don’t destroy your organization’s valuable property! If you are an IT services provider, don’t destroy your customers’ valuable property!

Those southern Pasadena residents are out there, waiting to be won over!

* Courtesy of http://jediname.com/


Martin Bartlett–Distinguished Engineer

Martin Bartlett is a principal in CSC’s insurance practice in the South and Western Europe region. Since joining CSC in 1988, Martin has played a central role in the architecture, design, development, implementation, deployment and support of some of CSC’s most far-reaching strategic projects in the insurance domain. He is considered the chief technical architect of GraphTalk A.I.A. and is a strong advocate for cloud-deployed SaaS solutions and the API economy.

See Martin’s bio.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: