The Big Data Solution: Hadoop Implementation
After obtaining information through our discovery and requirements gathering process, we architected a big data solution utilizing Hadoop in conjunction with a combination of other key open-source components to harness its full potential. In doing so, we created the MapReduced architecture illustrated below.
Our solution pre-processes and prepares the data to be consumed, creating a “solution” and “problem” file. These files are then aggregated and distributed: log files where sent to Solr for indexing and “solution” data to HDFS. Data is then passed into a sink to process and load it into a Hadoop component, which is then distributed to Solr Cloud and HDFS, respectively. The end result is structured data availability in multiple formats, with flexibility for low latency queries provided through Cloudera Impala and data visualization with OBIEE connectivity.