The Need For the New

Converged Architecture: When Big Data Meets Better Infrastructure
By William Van Winkle March 28, 2012 1:40 PM
Table Of Contents
  • 2. The Need For the New
2. The Need For the New

Oracle, Microsoft, and similar names created the software infrastructure relied on by many organizations today, but their platforms were not made with semi-structured and unstructured data in mind. This is why companies such as Vertica sprang up over the past few years and specialized in big data analysis.

When dealing with petabytes of data, it’s far better to have a system able to cull just the 1% that’s pertinent to a given problem rather than having to scan the entire asset base.

Similarly, the type of storage used to warehouse what has become big data has typically not changed in order to keep pace with shifting data types.

“A lot of the file serving technology out there wasn’t designed to accommodate large name spaces,” says HP’s Battas. “When you bought a NetApp system, it was a file server, but it was physically limited in terms of how big you could grow it. You had a two-controller architecture, and if you wanted more you bought another two-controller box. This is why we’ve been investing in scale-out, clustered, NAS technology. We can grow by just adding node after node up to very large capacities and name spaces. We’ve qualified our X9000 up to 16 petabytes, but there’s no technical limitation to going well beyond that.”

Of course, no one wants to pay for unnecessary petabytes of capacity. When the massive amounts of terabytes inherent to big data are in play, storage efficiency is essential in a cost effective deployment. In a prior converged storage article, we discussed thin provisioning. Thin technologies help to reduce the capacity necessary in an analysis system, just as tiering can help to lower the cost of deep storage for less-needed data. Data deduplication and virtual tape libraries are two more cost-saving must-haves in big data environments.

This makes for a lot of IT balls to keep in the air, which means that keeping management as simple as possible is paramount. Streamlined management combined with the lowest latency, highest throughput architecture are why many enterprises now turn to converged storage systems for their big data needs. Even back in 2006, the infancy of big data, VentureBeat reported that Amazon made 35% of its product sales through its “customers who bought this item also bought” recommendation engine. If even a rudimentary application of big data such as this can yield such outsized results, imagine what big data running on a modern, converged infrastructure might do for your organization.

Comment on this article
Comments