IBM Elastic Storage: Scans 10 Billion Files in 45 Minutes

By - Source: Toms IT Pro

The message IBM broadcast in a 90 minute announcement on Monday is that storage is a bottleneck and the company's solution is Elastic Storage. Built on existing technology, Elastic Storage is IBM's brand of software defined storage (SDS) that originally gave IBM's Watson the power it needed to win a Jeopardy challenge in 2011. Read: IBM's Watson Wins Jeopardy

However, while IBM is touting Elastic Storage as a breakthrough technology, version 4.1 of the Elastic Storage software is based on the company's General Parallel File System technology. This version of Elastic Storage is a generational improvement over what IBM built around Watson and uses to manage file storage in its supercomputers. An early version of Elastic Storage fed IBM Watson five terabytes of data or roughly 200 million pages of structured and unstructured data including the full text of Wikipedia during the Jeopardy challenge in 2011. Rather than reading the data from drive storage, the information was loaded into Watson's memory in minutes, according to IBM.

Watson is now 24 times faster and 90 percent smaller than it was 3 years ago and the current generation of Elastic Storage can scan 10 billion files on a single system in about 43 minutes.

While IBM Watson is remarkable technology, what IBM is promoting under the name of Elastic Storage is SDS and like other SDS solutions, IBM has to work within customer's definitions and expectations of what SDS is and how it should work. A basic definition of software defined storage is that it must be a software solution and must be vendor neutral when it comes to underlying hardware, in this case commodity storage media used by the customer.

IBM's Elastic Storage meets those particular conditions. Additionally, other features of IBM's SDS include intelligent use of flash memory to speed access to data. In IBM's internal tests, the use of flash increased performance by as much as six times over standard Serial Attached SCSI (SAS) drives.

Elastic Storage includes automated tiered data management that intelligently decides where data is stored to get the most efficient and cost effective use from existing storage whether it is tape, hard drives, flash, or other storage media. The solution uses virtualization to pool storage to allow multiple systems and applications to share common pools of storage.

SDS must be able to handle storage growth in data centers and Elastic Storage can expand to handle thousands of yottabytes (one yottabyte equals one billion petabytes). For businesses that have data that require an additional level of security, the solution includes native encryption of data at rest and secure erase (NIST SP 800-131A encryption compliance) to comply with HIPAA, Sarbanes-Oxley, and other regulatory requirements.

Elastic Storage will allow businesses to store, manage, and access data across private, public, and hybrid clouds through OpenStack cloud management software and OpenStack Cinder and Swift access. Elastic Storage also supports POSIX and Hadoop open application program interfaces (APIs). Elastic Storage supports Hadoop analytics by handling both transaction processing and analytics without the need to move and create duplicate copies of data. The Elastic Storage solution will be offered as an IBM SoftLayer cloud service later this year.

Something that is not news to most IT organizations is the volume of data requiring storage is not going to decrease. While good news for storage manufacturers such as Seagate, Western Digital, and Toshiba managing growth of traditional storage in IT is not becoming easier within businesses. As newer ways of analyzing data become available, a company's business, research, finance, marketing, and other stakeholders are going to want to store and use more data for longer periods.

One observation made in today's presentation was that big data has primarily been data stored and used within an enterprise. As businesses begin working with global data outside of their own data centers, the need to manage "bigger data" is going to become the next major challenge.