Getting Big Data Organized Is An Iterative Process

By - Source: Toms IT Pro

A new report out from Oracle gives one take on big data in the enterprise, at least from executives’ point of view. Oracle interviewed 333 executives in 11 industries.

Some of the findings were at the “water is wet” level of insight (e.g. “Nearly all surveyed (97%) say their organization must make a change to improve information optimization over the next two years”).  One of the more interesting findings is that “29% of executives give their organization a “D” or “F” in preparedness to manage the data deluge.”  Only 29%? If I had to bet, I’d say there a lot of executives who don’t understand the potential for big data and what is needed to actually extract value from it.  At the risk of sounding Rumsfieldian, there are executives who understand the known unknowns of big data and those that do not.

Jessica Miller-Merrell, for example, sees problems big data in HR:

“It’s a giant collection of crap, fashion favorites and memories stored from years past with no clear sense of direction, organization, or any idea in which to start.”

Big data by itself is not neatly organized.

It isn’t waiting to unveil useful information implied within it. You have to go looking for it. That means we need direction from the executives on what is strategically important.  That shouldn’t be a problem -- executives set and execute strategies, it’s their job.  It is a problem though.  We can get high level direction from C-level executives but it won’t be nearly precise enough to direct a group of analysts.  Analysts that are skilled with statistical analysis packages and data mining software and generate regressions, clusters, classifiers and a whole host of other models that might be useful to someone somewhere.  The problem is the chasm between what executives want in terms of quantifiable measurements they can use to make decisions to drive operations and what analyst target in their data mining efforts.

Big data can be organized with respect to a particular analytics problem. To do that we need someone who understands both the business drivers and the details of the structure and semantics of big data sets: a data scientist.   The work of a data scientist is iterative. There will be days spent generating descriptive statistics about data sets, determining how to join multiple data sets, and performing other data exploration tasks.  Sometimes these exercises don’t lead anywhere and its back to the drawing board.  Other days are spent proposing plans for what can be done with the data. This is where those with deep business knowledge are needed.   They are needed to help identify the valuable and feasible big data analysis projects. Their feedback is critical input to the next round analysis by data scientists.

If executives think they are making the grade when it comes to managing the data deluge, I hope they are thinking about more than servers and storage arrays. To extract value from big data we’ll need an iterative analytics process that includes personnel from a wide subset of the organization chart.

Dan Sullivan is an author, systems architect, and consultant with over 20 years of IT experience with engagements in systems architecture, enterprise security, advanced analytics and business intelligence. He has worked in a broad range of industries, including financial services, manufacturing, pharmaceuticals, software development, government, retail, gas and oil production, power generation, life sciences, and education.  Dan has written 16 books and numerous articles and white papers about topics ranging from data warehousing, Cloud Computing and advanced analytics to security management, collaboration, and text mining.

See here for all of Dan's Tom's IT Pro articles.

Check out these Tom's IT Pro Training and Certification Videos:

Dan Sullivan's The Silver Lining blog:

(Shutterstock image credit: Cloud Data Folder)