Data science already plays a significant role in specialized areas. Being able to predict machine failure is a big deal in transportation and manufacturing. Predicting user engagement is huge in advertising. And properly classifying potential voters can mean the difference between winning and losing an election.
But the thing that excites me most is the promise that, in general, data science can give a competitive advantage to almost any business that is able to secure the right data and the right talent. I believe that data science can live up to this promise, but only if we can fix some common misconceptions about its value.
For instance, here’s the standard storyline when it comes to data science: Data-driven companies outperform their peers; just look at Google, Netflix and Amazon. You need high-quality data with the right velocity, variety and volume, the story goes, as well as skilled data scientists who can find hidden patterns and tell compelling stories about what those patterns really mean. The resulting insights will drive businesses to optimal performance and greater competitive advantage. Right?
Well … not quite.
The standard storyline sounds really good. But a few problems occur when you try to put it into practice.
The first problem, I think, is that the story makes the wrong assumption about what to look for in a data scientist. If you do a Web search on the skills required to be a data scientist (seriously, try it), you’ll find a heavy focus on algorithms. It seems that we tend to assume that data science is mostly about creating and running advanced analytics algorithms.
I think the second problem is that the story ignores the subtle, yet very persistent tendency of human beings to reject things we don’t like. Often we assume that getting someone to accept an insight from a pattern found in the data is a matter of telling a good story. It’s the “last mile” assumption. Many times what happens instead is that the requester questions the assumptions, the data, the methods or the interpretation. You end up chasing follow-up research tasks until you either tell your requesters what they already believed or just give up and find a new project.
Figure 1: Historical Web searches of the term “data scientist” as reported by Google Trends
The first step in building a competitive advantage through data science is having a good definition of what a data scientist really is. The popularity of the term “data scientist” is relatively new (see Figure 1) and there is still plenty of debate on what it means.
I believe that data scientists are, foremost, scientists. They use the scientific method. They guess at hypotheses. They gather evidence. They draw conclusions. Like all other scientists, their job is to create and test hypotheses. Instead of specializing in a particular domain of the world, such as living organisms or volcanoes, data scientists specialize in the study of data. This means that, ultimately, data scientists must have a falsifiable hypothesis to do their job. Which puts them on a much different trajectory than what is described in the standard storyline.
If you want to build a competitive advantage through data science, you need a falsifiable hypothesis about what will create that advantage. Guess at the hypothesis, then turn the data scientist loose on trying to confirm or refute it. There are countless specific hypotheses you can explore, but they will all have the same general form:
It’s more effective to do X than to do Y
- Our company will sell more widgets if we increase delivery capabilities in Asia Pacific.
- The sales force will increase their overall sales if we introduce mandatory training.
- We will increase customer satisfaction if we hire more user-experience designers.
You have to describe what you mean by effective. That is, you need some kind of key performance indicator, like sales or customer satisfaction, that defines your desired outcome. You have to specify some action that you believe connects to the outcome you care about. You need a potential leading indicator that you’ve tracked over time. Assembling this data is a very difficult step, and one of the main reasons you hire a data scientist. The specifics will vary, but the data you need will have the same general form:
Figure 2: The data you need to build a competitive advantage using data science
Let’s take, for example, our hypothesis that hiring more user-experience designers will increase customer satisfaction.
We already control whom we hire. We want greater control over customer satisfaction — the key performance indicator. We assume that the number of user-experience designers is a leading indicator of customer satisfaction. User-experience design is a skill of our employees, employees build products and products influence customer satisfaction.
Figure 3: An example of the data you need to explore the hypothesis that hiring more user-experience designers will improve customer satisfaction.
Once you’ve assembled the data you need, let your data scientists go nuts. Run algorithms, collect evidence and decide on the credibility of the hypothesis. The end result will be something along the lines of “yes, hiring more user experience designers should increase customer satisfaction by 10% on average” or “the number of user experience designers has no detectable influence on customer satisfaction.” Notice, now, that we’ve pushed well past the “last mile.” At this point, progress is not a matter of telling a compelling story and convincing someone of a particular worldview. Progress is a matter of choosing whether or not the evidence is strong enough to justify taking action.
Figure 4: The process of accumulating competitive advantages using data science. It’s a simple adaptation of the scientific method.
This brand of data science may not be as exciting as the idea of taking unexplored data and discovering unexpected connections that change everything. But it works. The progress you make is steady and depends entirely on the hypotheses you choose to investigate.
Which brings us to the main point: There are many factors that contribute to the success of a data science team. But achieving a competitive advantage from the work of your data scientists depends on the quality and format of the questions you ask.
Jerry Overton is head of advanced analytics research in CSC’s ResearchNetwork and founder of CSC’s FutureTense competency, which includes the Predictive Modeling Research Group, Advanced Analytics Lab and Predictive Modeling School. Connect with him on Twitter.