The rise of data lakes — huge repositories of different types of data in original form — over the past couple of years is in response to a growing urgency among enterprises to efficiently and cost-effectively store and manage increasingly large and diverse sets of data. The data residing in the lake is unsiloed and can be easily and quickly accessed by employees seeking to extract analytical insights that will improve the business.
The truth is that data lakes contain hidden (if not entirely unexpected) hazards — murky areas, if you will. For starters, since data is simply dumped into a data lake in its raw form, there are no assurances of quality.
Security and control also are issues any time a large number of people are accessing data. Finally, while we someday soon may see the day when “everybody is a data scientist,” we’re not there yet, so many employees will struggle to leverage the data sets they drag from the lake.
Real concerns all, but the biggest pitfall for enterprises when it comes to data lakes is strategic. Ajay Khanna, vice president at data-driven apps development vendor Reltio, attended the recent Strata & Hadoop World conference. He tells Information Management, “It was surprising to see so many companies investing in, or considering investment in data lakes without clear business objectives in mind. There seemed to be a disconnect between business strategy and big data initiatives.”
Any time there’s a disconnect between a technology initiative and business goals, there’s a problem. A good CIO won’t launch a mobile program without specific business objectives in mind; the same standard should apply to data lakes. It’s basically a two-step process: 1) Make sure your overall big data and analytics initiatives are designed to help your enterprise achieve its business goals, and 2) Make sure your data lake — which is a tool or resource for your data analytics program — better enables your analytics initiative to meet those goals. Will using a data lake allow more employees to generate business insights? Will a data lake enable faster and better decisions?
If the answers to these and other questions make you take the data lake plunge, remember to keep asking them because data lakes grow and evolve over time, as do analytics initiatives and business strategies. Alignment is on ongoing process.
Data lakes can help support an enterprise’s analytics program, but you can’t just dive in. Without a strategy linking data lakes and analytics to the goals of the business, enterprises will swim in circles.