The confusion and the ensuing debate is understandable. Just as organizations were adjusting to dealing with Big Data concepts and terminologies, there has been a spurt of intermingling of new and not so new concepts, such as the use of Data Lakes for Business Intelligence. I can hear some of you thinking: “Are these two even compatible?”, “Isn’t Business Intelligence meant to use Data Warehouses?”, or, “Is this just another new-fangled experimental setup that is likely to hit my bottom-line?” Hold on. That’s a lot of questions.

Let’s first get down to the basics.

What’s exactly is a Data Lake?

Techopedia explains: The data lake architecture is a store-everything approach to big data. Data are not classified when they are stored in the repository, as the value of the data is not clear at the outset. As a result, data preparation is eliminated. A data lake is thus less structured compared to a conventional data warehouse. When the data are accessed, only then are they classified, organized or analyzed. Hadoop, an open-source framework for processing and analyzing big data, can be used to sift through the data in the repository.[1]


[1] https://www.techopedia.com/definition/30172/data-lake

What exactly is a Data Warehouse?

A data warehouse (DW) is a collection of corporate information and data derived from operational systems and external data sources. A data warehouse is designed to support business decisions by allowing data consolidation, analysis and reporting at different aggregate levels. Data is populated into the DW through the processes of extraction, transformation and loading.

Business Intelligence requires analysis of business data like sales, cost, total revenue etc through reporting, predictive analytics, datamining, benchmarking, and online analytical processing (OLAP). While up until recently, Business Intelligence coupled with Data Warehouses to give the desired results, the new line of thought advocates the use of Data Lakes instead, either exclusively, or in combination with Data Warehouses. The reason being that the speed of business has evolved over the years and now demands a level of agility that can’t be delivered by working only with Data Warehouses.

To summarize, a Data Lake is unorganized data as compared to the structured data in a Data Warehouse that has been achieved through extraction, transformation, and loading. Resultantly, building a Data Warehouse takes time and money. Also, there is no guarantee that the data that seemed relevant at the time of commissioning a Data Warehouse, would still be relevant by the time the warehouse is built. Here’s a helpful table by Tamara Dull at KDnuggets that summarizes the differences between Data Lake and Data Warehouse.

data lake vs data warehouse

Courtesy: www.kdnuggets.com

In other words, Data Lakes provide an agile and inexpensive environment to store and analyze data at will. While the lack of structure might prove challenging to business owners who want conclusions fast, it may prove effective for Data Scientists seeking better and more targeted analysis of an organization’s data.

Benefits of a Data Lake:

  • Provides a low-cost BI environment. This allows the identification of additional BI queries. Ask as many questions as you want!
  • Pre-empts the inherent risk of large builds that do not cater to the actual business requirements. After all, business is always in a dynamic state. What seemed relevant six months ago may no longer be relevant by the time a DW is ready.
  • Provides scope for validation of requirement and updating of plan as per evolving requirements. You can check and recheck requirements as frequently and as long as you like. Likewise, you can change your plans as per your business’ altered requirements.
  • Enables quick and frequent testing and analysis with evolving business conditions. Multiply the accuracy of your data analysis and keep it relevant.
  • Design-analysis-build cycles take just 1-2 weeks instead of the 6-12 months required by traditional Data Warehousing techniques. Be decisive with timely insights.
  • Less expensive to deploy and maintain since there is minimal requirement for ETL programs, data modeling, and integration. Cut costs dramatically while increasing your efficacy.

Keep Moving Forward with Aptude

Aptude is your own personal IT professional services firm. We provide our clients with first class resources in a continuous, cost-containment fashion.

Our support services will free up your senior IT staff from the overwhelming burden of day-to-day maintenance issues. Now they’ll have time to launch those new projects and applications you’ve been waiting for. Simply put, we can free up your resources and contain your costs. Let’s have a quick chat to discuss our exclusive services.

Contact Us

What exactly makes Data Lakes cost effective and efficient?

Hadoop:

The Data Lake architecture is made possible through Hadoop, an open source framework that acts as the main repository for all data. A whitepaper by The Jonah Group explains: “One of the key benefits of the Data Lake approach is that it takes advantage of inexpensive storage in Hadoop with its inherit simplicity of storing data based on its schema-less write and schema based read modes. While writing data to the Hadoop File System there is no need to define the schema of the written data.  This means that we can have a staging environment that stores all the data without designing and building a centralized Data Warehouse.  However, we now have all of the data at our disposal for analysis and reporting. In fact, the Data Lake architecture offers all of the functionality of a traditional Centralized Data Warehouse but without the upfront development costs.”

So, while the Data Lake provides cost effectiveness, it also provides agility by enabling the construction of business specific models to build Data Marts, where a Data Mart is a dimensional model for a specific subject area.

Elimination of upfront ETL processes:

A sizable portion of BI costs are associated with building Data Warehouses. The analysis, design and building of ETL processes is an enormous financial outgo that includes the initial licensing cost, development cost, and maintenance costs. By adopting Data Lakes, organizations can side-step this heavy financial outgo by developing ETL scripts in Pig language in association with MapReduce, which requires significantly less investment. In addition, Pig language enables Developers to focus on analysis instead of just ETL programming.

Growing focus on ‘prediction’ rather than just reporting and analysis:

The requirements from BI have evolved in the sense that business owners not only want to understand the known-knowns or the known-unknowns, but even the unknown-unknowns. The shift from reporting and analysis to prediction is palpable. The ability to foresee the future is what differentiates the leaders from the laggards. Data Lake architecture, a concept birthed though Big Data, furthers this endeavor, by providing businesses with the capability to slice and dice data to read the future by providing businesses with the capability to slice and dice data to read the future.

Should Data Lakes for BI completely replace Data Warehouses?

Data Lakes promise advantages over Data Warehouses that are impossible to ignore. However, the structure and discipline of Data Warehouses still hold enormous appeal. Barry Devlin, Founder & Principal of 9sight Consulting, opines that in a lake, all water is essentially equal, and that this structure is inappropriate for business data. He states: “Although some of the concepts and requirements that drove the creation of the data warehouse architecture are no longer applicable, there is a strong and permanent need for a core set of data that defines the state of the business. Such process-mediated data demands a highly structured and regulated data store.”

He further admits that there is also a growing requirement for loosely defined and frequently changing data which can be used to sense trends and anticipate changing demands on a business. He concludes that while highly structured data environments like Data Warehouse, and agile data environments like Data Lake are very different from one another, businesses need to be able to relate them to one another to make cogent conclusions. He states: “The insights derived from either one on its own are far less useful than those derived from their combined information. I see the resulting architecture as one consisting of a number of technological pillars, each optimized for a particular need and type of processing, but all interlinked through assimilation processes and metadata.”

I agree, and believe that the best approach is to take small, experimental steps to achieve the unique equilibrium of data analytics that your business demands. Otherwise, there is the risk of creating data silos even while trying to break them. Martin Willcox, Senior Director at Teradata cites the example of a customer that deployed over twenty large application specific Hadoop clusters and states that that was not a sustainable trajectory. He further adds: “In the 90s and 00s many organizations deployed multiple data mart solutions that ultimately had to be consolidated into data warehouses. These consolidation projects cut costs and added value but also sucked up resources – human, financial, and organizational – which delayed the delivery of net new Analytic applications. That same scenario is likely to play out for the organizations deploying tens of Hadoop clusters today.”

What’s your take?

I believe that the criteria for choosing Data Lakes over Data Warehouse or a combination of both, should not only factor the cost effectiveness but also the depth and accuracy of resulting analytics. After all, being competitive requires both – financial agility and market insight.

What Are You Working On?

Looking for intelligent technological solutions? Seeking consultation on your upcoming projects? Need a quote for services? Contact Aptude’s executive team directly. It’s amazing just how much one little email can rapidly accelerate your productivity.

Guy DeRosa

Senior Vice President

EMAIL GUY

Srinath Parepally

Vice President

EMAIL SRINATH

Uday Mehta

Vice President of Development

EMAIL UDAY

It’s amazing how one quick email can change your life. Give us a shout! We’ll get back to you right away with the right person for what you’re looking to accomplish.