The confusion and the ensuing debate is understandable. Just as organizations were adjusting to dealing with Big Data concepts and terminologies, there has been a spurt of intermingling of new and not so new concepts, such as the use of Data Lakes for Business Intelligence. I can hear some of you thinking: “Are these two even compatible?”, “Isn’t Business Intelligence meant to use Data Warehouses?”, or, “Is this just another new-fangled experimental setup that is likely to hit my bottom-line?” Hold on. That’s a lot of questions.
Let’s first get down to the basics.
What’s exactly is a Data Lake?
Techopedia explains: The data lake architecture is a store-everything approach to big data. Data are not classified when they are stored in the repository, as the value of the data is not clear at the outset. As a result, data preparation is eliminated. A data lake is thus less structured compared to a conventional data warehouse. When the data are accessed, only then are they classified, organized or analyzed. Hadoop, an open-source framework for processing and analyzing big data, can be used to sift through the data in the repository.
What exactly is a Data Warehouse?
A data warehouse (DW) is a collection of corporate information and data derived from operational systems and external data sources. A data warehouse is designed to support business decisions by allowing data consolidation, analysis and reporting at different aggregate levels. Data is populated into the DW through the processes of extraction, transformation and loading.
Business Intelligence requires analysis of business data like sales, cost, total revenue etc through reporting, predictive analytics, datamining, benchmarking, and online analytical processing (OLAP). While up until recently, Business Intelligence coupled with Data Warehouses to give the desired results, the new line of thought advocates the use of Data Lakes instead, either exclusively, or in combination with Data Warehouses. The reason being that the speed of business has evolved over the years and now demands a level of agility that can’t be delivered by working only with Data Warehouses.
To summarize, a Data Lake is unorganized data as compared to the structured data in a Data Warehouse that has been achieved through extraction, transformation, and loading. Resultantly, building a Data Warehouse takes time and money. Also, there is no guarantee that the data that seemed relevant at the time of commissioning a Data Warehouse, would still be relevant by the time the warehouse is built. Here’s a helpful table by Tamara Dull at KDnuggets that summarizes the differences between Data Lake and Data Warehouse.
In other words, Data Lakes provide an agile and inexpensive environment to store and analyze data at will. While the lack of structure might prove challenging to business owners who want conclusions fast, it may prove effective for Data Scientists seeking better and more targeted analysis of an organization’s data.
Benefits of a Data Lake:
- Provides a low-cost BI environment. This allows the identification of additional BI queries. Ask as many questions as you want!
- Pre-empts the inherent risk of large builds that do not cater to the actual business requirements. After all, business is always in a dynamic state. What seemed relevant six months ago may no longer be relevant by the time a DW is ready.
- Provides scope for validation of requirement and updating of plan as per evolving requirements. You can check and recheck requirements as frequently and as long as you like. Likewise, you can change your plans as per your business’ altered requirements.
- Enables quick and frequent testing and analysis with evolving business conditions. Multiply the accuracy of your data analysis and keep it relevant.
- Design-analysis-build cycles take just 1-2 weeks instead of the 6-12 months required by traditional Data Warehousing techniques. Be decisive with timely insights.
- Less expensive to deploy and maintain since there is minimal requirement for ETL programs, data modeling, and integration. Cut costs dramatically while increasing your efficacy.