Many large, well-known organizations use both MongoDB and Hadoop. Comparing these two in terms of popularity as a Big Data solution is difficult, because MongoDB is a more robust solution, not specifically catering to the Big Data crowd.
Currently, MongoDB is the most popular noSQL platform and seems to be on track to overtake PostgreSQL as the 4th most popular database. Keep in mind that the ranking engine mentioned previously gathers statistics of online trends and internet “buzz” to compile its results; this does not necessarily indicate higher adoption rates. Looking through the various lists of MongoDB powered applications and websites, it’s apparent that most of them are using it as a readily available/ real-time data solution. This includes real-time analytics, data availability for users on production websites, and serving up content and web pages. What seems to be lacking in MongoDB use cases are examples long-running ETL jobs, or batch jobs on complex sets of aggregated data.
Hadoop seems to have lower adoption rates than MongoDB overall as a platform, however a 2012 report indicates that it was the most popular choice for Big Data solutions the previous year. Since Hadoop is meant to operate in an ecosystem of various connectors, databases and platforms (it’s rather limited on its own), the same popularity list referenced about MongoDB may be more accurately translated by combining associated technologies like HBase, Hive, and Impala. Doing so would place it on the top 10 list, although winning a popularity contest doesn’t necessarily make any product the best choice. While MongoDB is more popular with real-time data needs, Hadoop is the more common choice when it comes to batch jobs and long running operations/analytics. This is likely due to its strengths in handling high volume, complex aggregated data feeds. It’s worth noting that real-time data analysis and adhoc queries can be available on Hadoop through other systems such as Apache Storm or Apache Hive.
Platform Strengths for Big Data Use Cases
Both of these platforms carry some of the same strengths versus a traditional RDBMS – scalability, parallel processing, MapReduce architecture, handling large amounts of aggregated data, and as open source software they both are considerably more economical choices than their other platforms that require licenses. Since they both are also architected to process data across clusters or nodes of commodity hardware, there is also a considerable savings in hardware costs.
When compared to Hadoop, MongoDB’s greatest strength is that it is a more robust solution, capable of far more flexibility than Hadoop, including potential replacement of existing RDBMS. Additionally, MongoDB also is inherently better at handling real-time data analytics. The readily available data also makes it capable of client-side data delivery (in a typical client-server web-based scenario), which is not as common with Hadoop configurations. Another strength of MongoDB is its geospacial indexing abilities, making an ideal use case for real-time geospacial analysis.
Hadoop, on the other hand, excels at batch processing and long-running ETL jobs and analysis. The biggest strength of Hadoop as a Big Data solution is that it was built for Big Data, whereas MongoDB became an option over time. An excellent use case for Hadoop is processing log files, which are typically very large and accumulate rather quickly. While Hadoop may not handle real-time data as well as MongoDB, adhoc SQL-like queries can be run with Hive, which is touted as being more effective as a query language than JSON/BSON. Hadoop’s MapReduce implementation is also much more efficient than MongoDB’s, and it is an ideal choice for analyzing massive amounts of data. Finally, Hadoop can accept data in just about any format, which eliminates much of the data transformation involved with the data processing.