10gen this week said it has made updates to the MongoDB Connector so that it's easier for Hadoop users to integrate with data in MongoDB. The updates include support for Apache Hive with SQL-like queries across live MongoDB data sets, support for incremental MapReduce jobs, and support for MongoDB BSON (Binary JSON) files on Hadoop Distributed File System (HDFS).
"The MongoDB Connector for Hadoop adds to the broadest set of query and data analysis capabilities of any NoSQL database, enabling companies to reduce the number of tools they use to get value from their data," the company said.
10gen said last week that the Connector presents MongoDB as a Hadoop-compatible file system. Thus real-time data pulled from MongoDB can be read and processed by Hardoop MapReduce jobs, such as when aggregating data from multiple input sources or as part of Hadoop-based data warehousing or ETL workflows. Hadoop job results can also be written back to MongoDB to support real-time operational processes and ad-hoc querying.
The Connector's new MongoDB BSON support means backup files can be stored locally in HDFS to reduce data movement between MongoDB and Hadoop, or on a local or cloud-based files system like Amazon S3. It's also possible that accessing MongoDB backup files will reduce load on busy operational MongoDB clusters, the company said.
Prior to the update, the Connector supported MapReduce, Pig, Hadoop Streaming (with node.js, Python or Ruby) and Flume. Now with the updated Apache Hive/MongoDB data set support, Hive can now access BSON files, and full support for MongoDB collections will be added to the next release slated to launch later this year.
Also added to the Connector is a new MongoUpdateWriteable feature that allows Hadoop to modify an existing collection in MongoDB, rather than only writing to new collections. Thus users can run incremental MapReduce jobs to match patterns or aggregate trends on a daily basis, which can then be queried in a single collection by MongoDB.
The MongoDB Connector for Hadoop provides users with a series of options including the MongoDB API for building mobile applications, the Aggregation Framework that provides functionality similar to SQL GROUP_BY operators, and integrations with BI vendors like QlikTech and Informatica to perform BI on their live data. There's also Native MapReduce for when integration with Hadoop isn't needed.
"We are seeing strong market adoption of MongoDB for real-time operational big data and Hadoop for deep, offline analytics. The community has been asking us to make these tools interoperate seamlessly, so they can focus on building value in their applications," said Max Schireson, CEO of 10gen. "The latest upgrades to the MongoDB Connector for Hadoop provide this interoperability."
Kevin Parrish is a contributing editor and writer for Tom's Hardware,Tom's Games and Tom's Guide. He's also a graphic artist, CAD operator and network administrator.
See here for all of Kevin's Tom's IT Pro articles.