Pivotal HD 2.0 and Gemini XD to Address BDL Architecture
Pivotal has released Pivotal HD 2.0, the company's supported version of Apache Hadoop 2.2, along with an additional tool for real-time analytics support with the general availability of GemFire XD, an in-memory database application. The company's integration of GemFire XD with Pivotal HD 2.0 is meant to deliver Business Data Lake (BDL) architecture for the enterprise.
Business Data Lake is a phrase Capgemini Consulting uses, and which Pivotal has adopted to describe manageable sets of data. Rather than an ocean of data that no one can get their head wrapped around, the consulting firm describes smaller bodies of data that enterprises can process quickly to gain value from without losing sight of the shoreline.
[ CHECK OUT : Tools & Methods for Big Data Usability ]
Pivotal is building on the BDL architecture with its hybrid open source and proprietary offerings. The foundation for BDL includes Pivotal HD 2.0, HAWQ SQL query engine, and Gemfire XD.
GemFire XD is an in-memory database geared towards processing real-time data by creating in-memory SQL datastores that can be converted to actionable intelligence much faster than running queries against legacy-stored databases. "Pivotal GemFire XD bridges GemFire's proven in-memory intelligence and integrates it with Pivotal HD 2.0 and HAWQ," according to the company.
With this release, HAWQ, Pivotal's massively parallel SQL processing engine was optimized for analytics. Some improvements to HAWQ include:
- 50 in-database analytic algorithms contained in the MADlib Machine Learning Library.
- Support for R, Python and Java programming languages to create procedures that might not be possible using SQL.
- Parquet read/write file support for the Open file type.
Also included, GraphLab is an open source framework that contains a set of algorithms and tools for analytics that allow advanced users (data scientists and analysts) to gain deeper insight into the data. Release 2.0 of Pivotal contains the first enterprise integration of GraphLab, according to the company.
Pivotal HD Enterprise at a glance. Image courtesy of Pivotal.The cost of getting an answer should never be more than the value the business gains from it. In order to get any value from the data, the information must be accurate, and just as importantly, it must be obtained and acted on quickly. The ideal is real-time analysis and although all companies rely on information to make decisions, the value of some types of data have a shorter lifespan.
For example, the financial and banking sector involved in fraud detection; the energy sector that monitors the power grid; and the telecom industry that manages the routing of billions of conversations and data streams. The faster the information can be retrieved and acted on, the more value will be returned to the business.
Apache Hadoop is certainly a technology that can help to provide actionable data, but it is only one part of an overall solution. Although it is too early to tell if Pivotal's solution is an answer that enterprises will embrace, the company appears to be working towards an integrated solution that may make it easier for enterprise data scientists, developers, and analysts to adopt and use. Easy to use is always a popular feature in applications.
"When it comes to Hadoop, other approaches in the market have left customers with a mishmash of un-integrated products and processes. Pivotal HD 2.0 is the first platform to fully integrate proven enterprise in-memory technology, Pivotal GemFire XD, with advanced services on Hadoop 2.2 that provide native support for a comprehensive data science toolset. Data driven businesses now have the capabilities they need to gain a massive head start toward developing analytics and applications for more intelligent and innovative products and services," says Josh Klahr, Vice President, Product Management, Pivotal.