Over the past few months we have been meeting with customers and prospects to discuss Big Data and the possible use cases and related advantages of kick-starting a Big Data initiative. A natural progression is to understand what Big Data “means” to them, then move onto industry use cases, and eventually narrow our focus to a specific problem we could consider tackling for them, possibly in the form of a Proof of Concept.
It seems as if almost every industry has plenty of opportunities to leverage Big Data and add some bottom-line value to the organization. Let’s take a quick look at some just to get our minds working.
The manufacturing sector wants to address quality concerns—and managing them quickly and effectively is still a challenge for many organizations. Big Data can help by efficiently identifying defect patterns, warranty claim patterns, and sensor-generated data. Big Data can also track the individual parts of a machine or piece of equipment that may be produced by multiple subcontractors.
I’m amazed thinking about Big Data opportunities in the healthcare domain. One of my favorite use cases is the predictability of health related outbreaks such as influenza for a particular geographical area by monitoring twitter. A recent article How Twitter Can Predict Flu Outbreaks Faster Than the CDC Infographic reported that a group from the University of Rochester in NY used Twitter to track the outbreak of flu throughout NY. They analyzed over 4 Million tweets that contained GPS related data over a span of 1 month in 2010. They plotted the results on a heat map and provided predictability of up to 8 days in advance of what area/location would be risk for the same.
There are plenty of rich Big Data financial services case studies to draw on including credit card fraud detection. Monitoring credit card transactions and related spending patterns, banks can fight fraud in real-time by comparing an in-process transaction against history and thus alert the account holder.
Regardless of the potential opportunities around Big Data, one thing is clear: if you’re going to start a Big Data project you are most likely going to have to justify the costs of such with hard ROI numbers. Some proponents of Big Data state that organizations should build a reservoir of data so you can ask the questions you aren’t thinking about asking today. I value the concept, but who can afford to justify any project on the basis of “if we build it they will come?”
As our discussions with clients move away from industry use cases to their specific company needs, many clients tell us that they are already doing Big Data-type activities such as capturing and reporting on large data sets driven from, possibly, log files or machine-born data. Some of our clients go on to state that they find it difficult to cost-justify capturing and reporting on unstructured data (social media extracts, for example). How do they quantify the value of such an exercise?
Eventually, the Big Data sizzle use case discussion turns to operational opportunities—and this is where we usually find some meaningful and practical areas of benefit regardless of industry. Specifically, there are many organizations out there that consistently bump against their Extract Transform Load (ETL) Service Level Agreements (SLA’s). Simply put, their ETL jobs are taking too long to run and it’s mostly because, in fact, our clients have been capturing large amounts of data but have not had an efficient method to process the data. Voila!!! We found a practical use case important to our client base that they can show to stakeholders and get “buy-in.”
I have heard of cases where ETL jobs took almost 24 hours to run in traditional RDBMS environments. When they moved them to a Hadoop, multi-node, multi-processing environment, the ETL process was cut down to just a couple of hours. Not only did the time SLA get slashed dramatically, the companies are doing this heavy lifting on commodity-based hardware and no Hadoop licensing cost (open source).
I know you’re probably thinking: “I’m not sure about open source, for who will support my enterprise application?” That’s where companies like our partner, Cloudera, come in to provide 24-hour support and training. So, in the end, we’ve found a relatively easy entry into Big Data that may not offer the sizzle some of the Big Data experts proclaim—but we still found real value in demonstrating insights into operational efficiency. The operational method of putting Big Data technology into the organization may open the doors to future development and therefore allowing those to ask questions they aren’t currently even thinking of asking.